TranslatorSRI / Babel

Babel creates cliques of equivalent identifiers across many biomedical vocabularies.
MIT License
9 stars 2 forks source link

too many files open when running chemicals.py #26

Closed kshefchek closed 2 years ago

kshefchek commented 3 years ago

When running chemicals.py I get

Traceback (most recent call last):
  File "babel/chemicals.py", line 682, in <module>
    load_chemicals(refresh_mesh=False,refresh_uniprot=False,refresh_pubchem=False,refresh_chembl=False)
  File "babel/chemicals.py", line 151, in load_chemicals
    concord = load_unichem(refresh=True)
  File "/opt/Babel/babel/unichem/unichem.py", line 21, in load_unichem
    return refresh_unichem(working_dir,xref_file,struct_file)
  File "/opt//Babel/babel/unichem/unichem.py", line 40, in refresh_unichem
    sorted_xref_file = sort_xref_file(srcfiltered_xref_file, xref_file)
  File "/opt/Babel/babel/unichem/unichem.py", line 244, in sort_xref_file
    batch_sort(inf, outf, key=uci_key, tempdirs='.')
  File "/opt/Babel/babel/big_gz_sort.py", line 41, in batch_sort
    output_chunk = open(os.path.join(tempdir,'%06i'%len(chunks)),'w+b',64*1024)
OSError: [Errno 24] Too many open files: './001016'

Is there a way to fix this without adjusting ulimit on the client OS?

cbizon commented 3 years ago

Hmmm. I'm not sure, but I think you can increase the 'buffer_size' argument to sort, which should lead to fewer, larger files

cbizon commented 2 years ago

I think no longer relevant with the new build system