clulab / reach

Reach Biomedical Information Extraction
Other
97 stars 39 forks source link

Enable pre-annotation of document objects #751

Closed enoriega closed 3 years ago

enoriega commented 3 years ago

I wrote this feature to support my own use case, but might as well be useful for anyone with large-scale annotation tasks.

To avoid generating Document objects every time the grammar changes, I added a class, AnnotationsCLI that mimicks ReachCLI, but stops short of running the grammar and serializes the Document and FriesEntry objects.

ReachCLI was enhanced to support "reading" ser files and pick it up from there, running the grammar over the pre-annotated documents and extracting the mentions.

If this makes sense to integrate into the master branch, I will update the documentation and write unit tests.

MihaiSurdeanu commented 3 years ago

I agree this is very useful! But there is a lot of code in here that is redundant with ReachCLI... Can you re-organize this code so there is no duplicated code? Maybe have a base CLI class that is extended by both ReachCLI and AnnotationsCLI?

enoriega commented 3 years ago

@MihaiSureanu Done. I can do more effort and refactor similarly the Run*CLI classes, but don't think it is worth to. I will go ahead and write doc + tests in the coming days