LHNCBC / metamaplite

A near real-time named-entity recognizer
https://metamap.nlm.nih.gov/MetaMapLite.shtml
Other
55 stars 14 forks source link

Added configuration flag to override default behavior regarding subsumed terms #25

Closed stevenbedrick closed 1 year ago

stevenbedrick commented 1 year ago

Often, concept matches will overlap in a sentence. For example, "Blood sugar levels" is itself a concept, as is "blood", "sugar", etc. MetaMap's default behavior in this situation is to remove matches that are fully subsumed by another, longer, matching concept as a post-processing step, and only return the matches for "blood sugar levels". While this is usually the desired behavior, there are scenarios where we do want all the matching entities, especially if we are optimizing for recall or if we have quirky vocabulary-related situations happening.

To facilitate this, there is now a configuration property named metamaplite.removeSubsumedEntities, set to "true" by default. When this is set "true", MetaMap sticks with its current, default behavior and removes subsumed entities. When it is set to "false", MetaMap does not prune its set of matching entities and processing proceeds as normal.