DATEXIS / UMLSParser

Python module to parse UMLS source files
Apache License 2.0
18 stars 1 forks source link

UMLS 2022 IndexError: list index out of range #9

Closed devfoo-one closed 1 year ago

devfoo-one commented 1 year ago

umls-2022AB-full.zip

umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2022_test/umls-extract')
INFO:umlsparser.UMLSParser:Initialising UMLSParser for basepath /home/toberhauser/DEV/Data/UMLS/2022_test/umls-extract
INFO:umlsparser.UMLSParser:No language filtering applied.
Parsing UMLS concepts (MRCONSO.RRF): 16761432it [01:42, 163714.10it/s]
Traceback (most recent call last):
  File "/home/toberhauser/DEV/GENERAL_PLAYGROUND/UMLS/2022_test.py", line 3, in <module>
    umls = UMLSParser('/home/toberhauser/DEV/Data/UMLS/2022_test/umls-extract')
  File "/home/toberhauser/DEV/GENERAL_PLAYGROUND/venv/lib/python3.8/site-packages/umlsparser/UMLSParser.py", line 54, in __init__
    self.__parse_mrconso__()
  File "/home/toberhauser/DEV/GENERAL_PLAYGROUND/venv/lib/python3.8/site-packages/umlsparser/UMLSParser.py", line 88, in __parse_mrconso__
    'SRL': line[15],
devfoo-one commented 1 year ago

image

MRCONSO.RRF ends with a corrupted line.

devfoo-one commented 1 year ago

2022AB:

image

devfoo-one commented 1 year ago

2019AB:

image

devfoo-one commented 1 year ago

UMLS2022 contains 3 MRCONSO splits!

image