For the moment, EDS-NLP only allows the extraction and normalisation of entities to ATC (via ROMEDI) and ICD10.
As UMLS is an international resource and brings together many terminologies (including SnomedCT) in many languages, integrating it would greatly benefit the library and its users to
automatically categorise texts in a corpus according to different concept IDs
perform entity searching
create processing rules (if ent.concept_id is a child of CUIXXXXX then, ...)
do pre-annotation of corpora
What changes
Add a method to download the UMLS data and create a CUI to synonym dictionary that is also saved locally using pystow.
Add a TerminologyMatcher in the same fashion as CIM10 and its corresponding entrypoint.
Add tests in tests/pipelines/ner/test_umls.py
Edit documentation (docs/pipelines/index.md and docs/pipelines/ner/umls.md)
Add new dependencies:umls_downloader, tqdm
Edit changelog
Checklist
[ ] If this PR is a bug fix, the bug is documented in the test suite.
[x] Changes were documented in the changelog (pending section).
[x] If necessary, changes were made to the documentation (eg new pipeline).
Description
For the moment, EDS-NLP only allows the extraction and normalisation of entities to ATC (via ROMEDI) and ICD10. As UMLS is an international resource and brings together many terminologies (including SnomedCT) in many languages, integrating it would greatly benefit the library and its users to
What changes
TerminologyMatcher
in the same fashion as CIM10 and its corresponding entrypoint.umls_downloader
,tqdm
Checklist