elsevierlabs-os / nerds

BSD 3-Clause "New" or "Revised" License
27 stars 11 forks source link

ExactMatchDictionaryNER can only annotate for one class at a time #4

Open sujitpal opened 4 years ago

sujitpal commented 4 years ago

The nerds.core.model.ner.dictionary.ExactMatchDictionaryNER class allows tagging against a single class. The constructor forces us to specify a path to the dictionary file and a class label. This is most likely driven by the misconception that the pyahocorasick module can only support a single class at a time, which is incorrect.

Proposal here is to build an additional nerds.core.model.ner.dictionary.ExactMatchMultiClassDictionaryNER implementation that can handle dictionary lookup against multiple entity classes.

sujitpal commented 4 years ago

Pull request created: https://github.com/elsevierlabs-os/nerds/pull/5

sujitpal commented 4 years ago

Also added a pseudo fit method that allows the ExactMatchMultiClassDictionaryNER to be used similar to other NER models (i.e., fit with Xtrain, transform with Xtest, rather than load automaton from provided dictionary during construction and then calling transform thereafter.