Open mubaldino opened 4 years ago
Add "text_norm" to indexer to review common false-pos still appearing.
Addressed in part by NonSenseFilter -- removing lowercase matches.
Seems more like gazetteer ETL fixes than a pattern generalization. If such trivial gazetteer entries should never be tagged, then we mark them search_only=1
Describe the bug
"Do. Do"
,"do. Do"
, "in Do"`, etc. are common false positives found still.To Reproduce Xponents 3.3
Expected behavior Better filtering of these. Likely use a spaCy NER model to offer POS tags and eliminate obvious errs.