├── root
│ └── dataset
│ ├── conll03_train.json
│ ├── conll03_dev.json
│ ├── conll03_test.json
│ ├── conll03_tag_to_id.json
│ └── ...
│ └── models
│ ├── __init__.py
│ └── modeling_roberta.py
│ └── utils
│ ├── __init__.py
│ ├── config.py
│ ├── data_utils.py
│ ├── eval.py
│ └── ...
│ └── ptms
│ └── ... (trained results, e.g., saved models, log file)
│ └── cached_models
│ └── ... (RoBERTa pretrained model, which will be downloaded automatically)
│ └── run_script.py
│ └── run_script.sh
sh run_script.sh <GPU ID> <DATASET NAME>
e.g.,
sh run_script.sh 0 conll03
Specific parameters for different datasets can be found in our paper, and then modify them in run_script.sh
.
The implementation is based on https://github.com/cliang1453/BOND
@inproceedings{zhang:2021,
title={Improving Distantly-Supervised Named Entity Recognition with Self-Collaborative Denoising Learning},
author={Xinghua Zhang and Bowen Yu and Tingwen Liu and Zhenyu Zhang and Jiawei Sheng and Xue Mengge and Hongbo Xu},
booktitle={Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
year={2021}
}