Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
376 stars 95 forks source link

LevelDBError: IO error: lock umls_data/cui-semtypes.db/cui.leveldb/LOCK: already held by process #35

Closed yzhaoinuw closed 5 years ago

yzhaoinuw commented 5 years ago

Hi, I set up the QuickUMLS 1.2.6 release successfully. I wrote a code called "run_test.py" in the same directory where the QuickUMLS content is. I first run the following section of run_test.py to make sure that everything works so far ('umls_data' is the folder where I put MRCONSO.REF, MRSTY.REF and other UMLS tables.): " from quickumls import QuickUMLS

parser = QuickUMLS('umls_data') " I got "[nltk_data] Downloading package stopwords to /home/yue/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip." and nothing more. So I assumed that I had set up QuickUMLS correctly. Then I added three more lines to run_test.py. Now it looks like this: " from quickumls import QuickUMLS

parser = QuickUMLS('umls_data')

text = "The ulna has dislocated posteriorly from the trochlea of the humerus." results = parser.match(text, best_match=True, ignore_syntax=False) print (results) " However after I ran it, I got "LevelDBError: IO error: lock umls_data/cui-semtypes.db/cui.leveldb/LOCK: already held by process".

The full error message is: "Traceback (most recent call last):

File "", line 1, in runfile('/home/yue/python_projects/QuickUMLS/run_test.py', wdir='/home/yue/python_projects/QuickUMLS')

File "/home/yue/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 704, in runfile execfile(filename, namespace)

File "/home/yue/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "/home/yue/python_projects/QuickUMLS/run_test.py", line 11, in parser = QuickUMLS('umls_data')

File "/home/yue/python_projects/QuickUMLS/quickumls.py", line 103, in init self.cuisem_db = toolbox.CuiSemTypesDB(cuisem_fp)

File "/home/yue/python_projects/QuickUMLS/toolbox.py", line 227, in init os.path.join(path, 'cui.leveldb'))

LevelDBError: IO error: lock umls_data/cui-semtypes.db/cui.leveldb/LOCK: already held by process"

Does anyone know what happened here?

soldni commented 5 years ago

It looks like QuickUMLS did not shut correctly when you first killed your script. Just delete lock umls_data/cui-semtypes.db/cui.leveldb/LOCK to get it back running.

-Luca

GregSilverman commented 5 years ago

Hi, I'm running in to this when trying to iterate through the various similarity_names and overlapping_criteria. (I'd also like to add in all combos of best_match and ignore_syntax, too for when I run the match method)

After the first pass, the error gets thrown.

My use case is pretty simple (I want to write all models to a dataframe for later processing). My proposed code is:

similarity = ['dice', 'cosine', 'overlap', 'jaccard']
overlapping_criteria = ['score', 'length']

for s in similarity:
    for o in overlapping_criteria:
        matcher = QuickUMLS(quickumls_fp=quickumls_fp, overlapping_criteria, threshold=0.7, window=5, similarity_name=s)
        test = pd.DataFrame()
        for fname in glob.glob(directory_to_parse + '*.txt'):
            t = os.path.basename(fname)
            u = t.split('.')[0]
            with open(directory_to_parse + u + '.txt') as f:
                f1 = f.read()
                out = matcher.match(f1, best_match=True, ignore_syntax=False)
                for i in out:
                    i[0]['file'] = u
                    frames = [ test, pd.DataFrame(i[0], index = [0]) ]
                    test = pd.concat(frames, ignore_index=True)

        test['system'] = 'quick_umls'
        test['similarity'] = s
        test['overlap'] = o
        test['type'] = 'concept'
        test['note_id'] = u

        temp = test.rename(columns={'start': 'begin'}).copy()
        print(temp.tail())

How can I do this without having to get exit from my python interpreter and then restart it for each successive run? I looked in the source and there wasn't anything there that was obvious. The issue is that it is not possible to create another QuickUMLS object after the first time it gets instantiated.

soldni commented 5 years ago

Hi Greg,

That’s is correct: because QuickUMLS get a lock on the SQLite database, you can’t instantiate more than one QuickUMLS reader. I’d recommend using the built-in server client support (see section in the README file) to create multiple clients that talk to a single server. Let me know if that works!

Best, Luca

GregSilverman commented 5 years ago

Hi, I played around with this a tiny bit a few days ago. Unless I am missing something, this issue is that the QuickUMLS options have to be passed in server startup, which is a similar problem to that stated above, in that if I wanted to change, for example, the similarity algorithm, then I would have to restart the server with the new value of similarity_name.

GregSilverman commented 5 years ago

Since I have multiple data sets to run through this on all models, I think I may just modify my python script to accept parameters and then just iterate through all the possible options in a bash script.