Closed hickst closed 8 years ago
Attached is a candidate sample file which could be used to implement a NER override and auxiliary grounding capability. It is based on the NMZ auxiliary spreadsheet of 1/11/16 merged and updated with the latest (6/9/16) NMZ spreadsheet model, after conversion of adhoc NMZ types to Reach types.
Note: the file extension should be .tsv
but GitHub doesn't support uploads with this extension.
NMZ-NER-aux_160624.txt
See additional examples in issue #60.
@hickst: I added override capability to the bio NER in processors in the branch "ner-override". I also added your NMZ-NER-aux_160624.tsv.gz to bioresources. Can you please do the following:
Let me know when the first 2 are done, so we can release bioresources and processors.
We often know how a given entity should be labeled. Assignments from knowledge sources should be able to override the NER's default classifications.
For example: the CRF seems to be responsible for identifying 'H-RAS' and 'K-RAS' (but not 'HRAS' or 'KRAS') as protein families even though our knowledge sources list these exclusively as proteins.