DATEXIS / UMLSParser

Python module to parse UMLS source files
Apache License 2.0
18 stars 1 forks source link

UMLSParser

Parses the UMLS source files.

Getting Started

Acquiring UMLS Data

In order to use the UMLS you have to be licensed. For more information please refer to https://uts.nlm.nih.gov/home.html -> Request a License.

This tool requires the full UMLS release, so please download the Full UMLS Release Files.

Prerequisites

Installing

Extracting Relevant Data out of the UMLS Full Release

TODO: MAKE SCRIPT AND CHANGE PATHS IN PARSER ACCORDINGLY

mkdir umls-extract
mkdir umls-extract/META
mkdir umls-extract/NET
unzip umls-2022AB-full.zip
rm umls-2022AB-full.zip
unzip 2022AB-full/2022ab-1-meta.nlm
unzip 2022AB-full/2022ab-otherks.nlm
gunzip -c 2022AB/META/MRCONSO.RRF.*.gz > umls-extract/META/MRCONSO.RRF
gunzip 2022AB/META/MRDEF.RRF.gz
mv 2022AB/META/MRDEF.RRF umls-extract/META/
gunzip 2022AB/META/MRSTY.RRF.gz
mv 2022AB/META/MRSTY.RRF umls-extract/META/
mv 2022AB/NET/SRDEF umls-extract/NET/
mv 2022AB/NET/SRSTRE1 umls-extract/NET/

rm -rf 2022AB-full/

Usage

TODO WRITE ME

Examples

Getting all concepts that have a ICD10CM identifier

from umlsparser import UMLSParser

umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2017AA-full/2017AA')

for cui, concept in umls.get_concepts().items():
    if 'ICD10CM' in concept.get_source_ids().keys():
        icd10ids = concept.get_source_ids().get('ICD10CM')
        print(icd10ids, concept.get_preferred_names_for_language('ENG')[0])

Generate a table for the distribution of all english UMLS sources

from umlsparser import UMLSParser
import collections

umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2017AA-full/2017AA')
sources_counter = collections.defaultdict(int)
for cui, concept in umls.get_concepts().items():
    sources = concept.get_source_ids().keys()
    for source in sources:
        sources_counter[source] += 1
print('|SOURCE|COUNT|\n|------|-----|')
for source, count in sorted(sources_counter.items(), key=lambda t: t[1], reverse=True):
    print('|{}|{}|'.format(source, count))

Generate a list of all english concept names and their semantic category

from umlsparser import UMLSParser

umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2017AA-full/2017AA')

for cui, concept in umls.get_concepts().items():
    tui = concept.get_tui()
    name_of_semantic_type = umls.get_semantic_types()[concept.get_tui()].get_name()
    for name in concept.get_names_for_language('ENG'):
        print(cui, name, tui, name_of_semantic_type)

Versioning

We use SemVer for versioning. For the versions available, see the tags on this repository.

Authors