Georgetown-IR-Lab / QuickUMLS

System for Medical Concept Extraction and Linking
MIT License
369 stars 95 forks source link

LevelDB Error: Too many open files #21

Closed khalludi closed 6 years ago

khalludi commented 6 years ago

I am trying to use quickUMLS to classify a document. I try splitting the document by paragraph and then matching. It works for some documents and not in others. The main error that I get if it does not work is:

Traceback (most recent call last):
  File "cui_test.py", line 23, in <module>
    match = matcher.match(line[:int(len(line)/3)], best_match=True, ignore_syntax=False)
  File "/Users/khalid/prog/work/Call_LSTM/QuickUMLS_master/quickumls.py", line 321, in match
    matches = self._get_all_matches(ngrams)
  File "/Users/khalid/prog/work/Call_LSTM/QuickUMLS_master/quickumls.py", line 221, in _get_all_matches
    cuisem_match = sorted(self.cuisem_db.get(match))
  File "/Users/khalid/prog/work/Call_LSTM/QuickUMLS_master/toolbox.py", line 258, in get
    cuis = pickle.loads(self.cui_db.Get(db_key_encode(term)))
leveldb.LevelDBError: IO error: /Users/khalid/prog/work/quickUMLS/cui-semtypes.db/cui.leveldb/006106.ldb: Too many open files

Choosing a smaller amount of characters seems to fix the problem. This is my matcher initialization:

matcher = QuickUMLS("/Users/khalid/prog/work/quickUMLS", window=20)

And this is a sample call:

match = matcher.match(line, best_match=True, ignore_syntax=False)

I'm guessing that if there are too many cuis in one section of the text, then an error is shown. It would be a nice feature if the matcher automatically split the amount of text to match all of the cuis without error instead of having the user come up with a solution themselves.

soldni commented 6 years ago

Khalid,

Please refer to issue #20 for a fix to your problem. In short: you need to configure your OS to allow more file descriptors to be open. Feel free to re-open this issue if that doesn't work.

khalludi commented 6 years ago

That would seem to fix it. Thanks for the reply.

I am posting this response as an alternative to creating larger file limits on the operating system.

I decided not to mess with kernel level settings since I am still in an early phase and the files I am scanning are relatively small. Instead, I chose to break the document into smaller portions and then match. In short, the code attempts to match the given line, but if it encounters an error then it splits the line in half and does two separate matches recursively.

def quick_match(matcher, line):
    ret = []
    try:
        tmp = matcher.match(line, best_match=True, ignore_syntax=False)
        ret.append(tmp)
    except:
        ret.append(quick_match(matcher, line[:int(len(line)/2)]))
        ret.append(quick_match(matcher, line[int(len(line)/2):]))
    return ret

EDIT*** - When using this method, the output is mostly unusable without flattening the output first.