clulab / reach

Reach Biomedical Information Extraction
Other
96 stars 39 forks source link

Serialize JSON in a way that doesn't require duplicate calculations of Document hashes #789

Open kwalcock opened 1 year ago

kwalcock commented 1 year ago

This is coordinated with changes in processors and can only be used after processors is updated.

Uses of implicits and package files have been removed. Much duplicated code has been removed. Document hashes are stored temporarily and reused. Several related issues have been filed. TODOs have been added near problematic code.

The new output matches previous output except that roots are sorted and triggers get their actual class and not the generic TextBoundMention. All tests showed IDs being equal to their previous values, even when those values are problematic. Those fixes are scheduled for later.

There is still some debug output for timing that needs to be removed after further testing.