ShannonAI / mrc-for-flat-nested-ner

Code for ACL 2020 paper `A Unified MRC Framework for Named Entity Recognition`
643 stars 117 forks source link

Is the model able to handle partially annotated training data? #48

Open DanqingZ opened 3 years ago

DanqingZ commented 3 years ago

Hi, I am looking into the data (for example conll2003). Although for different entity, we can generate different context, query pair. For each context, we have to generate the (context, question, answer) for all entities. I am wondering if I have partially annotated training data, can I only generate (context, question, answer) when we have the entity in the context?

Thank you!

ghost commented 3 years ago

Thanks for asking! You should add an IF-Statementif not tmp_impossible: before this line https://github.com/ShannonAI/mrc-for-flat-nested-ner/blob/master/data_preprocess/generate_mrc_dataset.py#L91. After that, run script/data/gen_mrc_ner_datasets.sh and only (context, question, answer) pairs with entities in the context will be saved.
tmp_impossible is True denotes that no entities are in the context. Otherwise, at least one entity with the entity type query exists in the context.
I hope this clarifies your question.

DanqingZ commented 3 years ago

thank you for answering my question. I am actually confused about the following questions: