Open ddofer opened 1 month ago
Hi, this is not something exists right now, although is a reasonable feature request if you wanted to give implementing it a go! Otherwise, I recommend doing what you are doing and post hoc filtering (setting the threshold such that you get enough candidates after filtering)
When doing NER/NEL to UMLS/CUI entities, is there any way to configure the nlp pipe to exclude candidates by a predefined filtering list of CUIs or TUIs? e.g. to exclude any detected CUIs with TUI: T079 (Temporal Concept)?
Currently I'm doing it by post-hoc filtering, which is both inelegant, inneffecient, and doesn't help remove noisy detections. i.e., if the linker returns the first detected entity froma text, then post-hoc filtering to remove the TUI means I miss the relevant entities.
Current code extract:
`nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True, "linker_name": "umls", "max_entities_per_mention": 4, #5 "threshold":0.87 ## default is 0.8, paper mentions 0.99 as thresh })
...
EXCLUDE_TUIS_LIST = ["T079","T093"] #List of umls cui semtypes to exclude.
novel_cols_candidates_names = [] no_entities_list = []
novel_candidate_cuis = [] novel_candidate_cuis_nomenclatures = [] TUIs_list = []
for f in icu_feature_terms["name"]: print(f) doc =nlp(f) linker = nlp.get_pipe("scispacy_linker")
`