Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

Fresh QuickUMLS installation on Windows 10 returning empty [] - no errors #82

Open khankanz opened 2 years ago

khankanz commented 2 years ago

Describe the bug

Environment

Additional context

khankanz commented 2 years ago

Happy to provide additional details. Not sure which direction to investigate further.

khankanz commented 2 years ago

I started to dig into the source code. I started writing scripts to test pieces of the code. I'm noticing that the retrieve function keeps returning null. This same script works perfectly fine on my Ubuntu VM.

import unicodedata
from quickumls_simstring import simstring
import os, six, unicodedata

def safe_unicode(s):
    if six.PY2:
        # in python 3, there no ambiguity on whether
        # a string is encoded in bytes format or not
        try:
            s = u'%s' % s
        except UnicodeDecodeError:
            s = u'%s' % s.decode('utf-8')

    return u'{}'.format(unicodedata.normalize('NFKD', s))

def prepare_string_for_db_input(s):
    if six.PY2:
        print('s > six.PY2', s)
        return s.encode('utf-8')
    else:
        print('s > NO six.PY2', s)
        return s

path = "FILEPATH/umls-simstring.db"
print(os.path.join(path, 'umls-terms.simstring'))
db = simstring.reader(os.path.join(path, 'umls-terms.simstring'))
#Use cosine & threshold 0.6
db.measure = simstring.cosine
db.threshold = 0.6
term = "elbow, ula"

print('term ready for db lookup:', prepare_string_for_db_input(safe_unicode(term)))
print(db.retrieve(prepare_string_for_db_input(safe_unicode(term))))