dwadden / dygiepp

Span-based system for named entity, relation, and event extraction.
MIT License
569 stars 120 forks source link

Training on new data in scierc format #92

Closed Kehindeajayi01 closed 2 years ago

Kehindeajayi01 commented 2 years ago

Hi dwadden, Thanks for the great work. I am trying to train your model on a new data that's in scierc format. I have used the brat_input.py to convert my .ann and .txt to scierc format. However when I ran the training script on my data, I got the error below:

Screen Shot 1400-11-21 at 13 08 15
dwadden commented 2 years ago

Hi,

Can you please attach a minimal example (ideally a single line from a jsonl file) that reproduces this error, along with the model training command you use to start training?

Dave

Kehindeajayi01 commented 2 years ago

Hi,

Can you please attach a minimal example (ideally a single line from a jsonl file) that reproduces this error, along with the model training command you use to start training?

Dave

Screen Shot 2022-02-13 at 3 06 54 PM

This is the training command I used: bash scripts/train.sh scierc_lightweight

dwadden commented 2 years ago

Sorry for the slow response. Could you include the example as a file attachment rather than as a screenshot?

Kehindeajayi01 commented 2 years ago

Sorry for the slow response. Could you include the example as a file attachment rather than as a screenshot? Please find the attached for the requested file.

dygie_example.txt

Thanks

dwadden commented 2 years ago

In your example, one of your relations has value [109, 109, null, null, "has_unit"]. The code is breaking due to the presence of null values. See data.md on the required format for input data.

dwadden commented 2 years ago

I'll close this. Feel free to re-open if this doesn't solve your problem.

Kehindeajayi01 commented 2 years ago

In your example, one of your relations has value [109, 109, null, null, "has_unit"]. The code is breaking due to the presence of null values. See data.md on the required format for input data.

The null were introduced by your brat_input.py script when I used it to convert the brat format to your format

dwadden commented 2 years ago

Ah I see, sorry for closing prematurely.

brat_to_input.py was added by a contributor; more info here. I unfortunately can't offer support, but maybe @serenalotreck can help?