dwadden / dygiepp

Span-based system for named entity, relation, and event extraction.
MIT License
573 stars 120 forks source link

Change the common-sense check clause to catch exception, warn and drop #79

Closed serenalotreck closed 2 years ago

serenalotreck commented 2 years ago

After trying this on some real data, it looks like mismatching tokenizations of the entity annotations and source documents are somewhat common, so having the script error out wasn't ideal. I've added a try/except clause to catch the error and warn the user, and then a continue to skip the rest of adding the entity.

dwadden commented 2 years ago

Ah, the joys of tokenization. Sure, makes sense.

Two little suggestions:

serenalotreck commented 2 years ago

Thanks for the feedback, I made those changes! It does print the {n_failed} / {n_total} after each individual doc, let me know if you'd like me to change that to do a total at the end of all docs.

dwadden commented 2 years ago

Merged.