Closed kristinlindquist closed 1 year ago
The approaches you described are exactly what I would try first. Using tfidf or something similar, and using the umls type tree. Filtering to a higher score threshold can also be helpful. Lastly, you could try training a further entity linking model to distinguish.
Cool thank you @dakinggg. I will go ahead and close this!
This is a question and not a feature request or bug report, so let me know if I should put it elsewhere.
Does anyone have any general techniques to prevent a general concept from matching to a highly specific concept? As an example, "high-risk" is matched to the UMLS record for "unsafe sex".
Other examples:
Additionally, any ideas about filtering out generic term matches, even if accurate? I can filter on "types", e.g. to say I am only interested in T121 (Pharmacologic Substance), but it will still match a bunch of terms to "Pharmaceutical Preparations" and the like. I can do this in post-processing, perhaps with some tfidf approach or some "specificity score" based on where the entity sits in the UMLS tree. But I figured I'd ask if anyone has a better way.