UMLSParser

Parses the UMLS source files.

Getting Started

Acquiring UMLS Data

In order to use the UMLS you have to be licensed. For more information please refer to https://uts.nlm.nih.gov/home.html -> Request a License.

This tool requires the full UMLS release, so please download the Full UMLS Release Files.

Prerequisites

Installing

Extracting Relevant Data out of the UMLS Full Release

TODO: MAKE SCRIPT AND CHANGE PATHS IN PARSER ACCORDINGLY

mkdir umls-extract
mkdir umls-extract/META
mkdir umls-extract/NET
unzip umls-2022AB-full.zip
rm umls-2022AB-full.zip
unzip 2022AB-full/2022ab-1-meta.nlm
unzip 2022AB-full/2022ab-otherks.nlm
gunzip -c 2022AB/META/MRCONSO.RRF.*.gz > umls-extract/META/MRCONSO.RRF
gunzip 2022AB/META/MRDEF.RRF.gz
mv 2022AB/META/MRDEF.RRF umls-extract/META/
gunzip 2022AB/META/MRSTY.RRF.gz
mv 2022AB/META/MRSTY.RRF umls-extract/META/
mv 2022AB/NET/SRDEF umls-extract/NET/
mv 2022AB/NET/SRSTRE1 umls-extract/NET/

rm -rf 2022AB-full/

Usage

TODO WRITE ME

Examples

Getting all concepts that have a ICD10CM identifier

from umlsparser import UMLSParser

umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2017AA-full/2017AA')

for cui, concept in umls.get_concepts().items():
    if 'ICD10CM' in concept.get_source_ids().keys():
        icd10ids = concept.get_source_ids().get('ICD10CM')
        print(icd10ids, concept.get_preferred_names_for_language('ENG')[0])

Generate a table for the distribution of all english UMLS sources

from umlsparser import UMLSParser
import collections

umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2017AA-full/2017AA')
sources_counter = collections.defaultdict(int)
for cui, concept in umls.get_concepts().items():
    sources = concept.get_source_ids().keys()
    for source in sources:
        sources_counter[source] += 1
print('|SOURCE|COUNT|\n|------|-----|')
for source, count in sorted(sources_counter.items(), key=lambda t: t[1], reverse=True):
    print('|{}|{}|'.format(source, count))

Generate a list of all english concept names and their semantic category

from umlsparser import UMLSParser

umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2017AA-full/2017AA')

for cui, concept in umls.get_concepts().items():
    tui = concept.get_tui()
    name_of_semantic_type = umls.get_semantic_types()[concept.get_tui()].get_name()
    for name in concept.get_names_for_language('ENG'):
        print(cui, name, tui, name_of_semantic_type)

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors

Tom Oberhauser - Initial work - GitHub

DATEXIS / UMLSParser

readme