dainlp / acl2020-transition-discontinuous-ner

65 stars 9 forks source link

Adding back the document names on the final files #18

Closed rcmcabral closed 10 months ago

rcmcabral commented 10 months ago

Hi. First of all, thank you for sharing the preprocessing codes.

I read your reply on one of the other issues about no_doc_info for the CADEC dataset. I found that it only affects the produced inline.txt file but it eventually gets erased on the split_train_test.py. It's an easy edit to include it back though. Have you also built-in a way to include the document name for the ShARe datasets? Upon checking there's no argument for it so thought I'd ask first before I create a workaround for it. Thanks.

dainlp commented 10 months ago

Hi, I think I didn't do that. I might have thought of this as a pure sentence-level NER task at that time.

rcmcabral commented 10 months ago

Got it! Thanks for replying!