Closed serenalotreck closed 2 years ago
Ah, the joys of tokenization. Sure, makes sense.
Two little suggestions:
warnings.warn
- see the warnings module - rather than a try/catch
. That way, the user has control over how many warnings they want to see.AnnotatedDoc
class, and at the end of preprocessing print a message like {n_failed} / {n_total}
entities were not matched successfully.Thanks for the feedback, I made those changes! It does print the {n_failed} / {n_total}
after each individual doc, let me know if you'd like me to change that to do a total at the end of all docs.
Merged.
After trying this on some real data, it looks like mismatching tokenizations of the entity annotations and source documents are somewhat common, so having the script error out wasn't ideal. I've added a try/except clause to catch the error and warn the user, and then a
continue
to skip the rest of adding the entity.