Inference. RuntimeError: Too many open files

Stakaitis commented 2 years ago

Question: Where should I add the suggested "torch.multiprocessing.set_sharing_strategy('file_system')" code. And will it work if I don't have administrative privileges in my cluster? I do not have the permission to increase the limit through "ulimit -n"

Issue: Inference step fails Error:

RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using "ulimit -n" in the shell or change the sharing strategy by calling "torch.multiprocessing.set_sharing_strategy('file_system')" at the beginning of your code

Command: m6anet-run_inference --input_dir ${DATAPREP_DIR} --out_dir ${INFERENCE_DIR} --infer_mod_rate --n_processes ${CPUS} File sizes in ${DATAPREP_DIR}: 1) 128M; eventalign.index; 2611156 (wc -l) 2) 42M; data.readcount; 1710279 (wc -l) 3) 2.2M; data.log; 48222 (wc -l) 4) 3.8G; data.json; 1710278 (wc -l) 5) 73M; data.index; 1710279 (wc -l)

Additional info: m6anet version: 1.1.0 from pypi OS: CentOS Linux 7 (Core) Experiment: the whole flowcell was used for 1 sample (dRNA-seq run), which generated 216 .fast5 files

chrishendra93 commented 2 years ago

hi @Stakaitis , have you tried reducing the number of processes? I think inference step in general should not take that long even with just 1 worker. Otherwise you can maybe try adding torch.multiprocessing.set_sharing_strategy('file_system') in the m6anet/m6anet/scripts/run_inference.py, but I haven't encountered this issue before so I don't know if this will work

Stakaitis commented 2 years ago

Reducing the n_processes to 1 didn't help. Inserting the torch.multiprocessing.set_sharing_strategy('file_system') line in the _/home/stakatis/miniconda3/envs/m6anet/lib/python3.9/site-packages/m6anet/scripts/runinference.py solved this issue. However, I'm getting very slightly different results in _nreads column when m6anet-run_inference is ran on iOS laptop (without torch.multiprocessing.set_sharing_strategy('file_system')) and Linux cluster. Linux head of data.result.csv:

transcript_id,transcript_position,n_reads,probability_modified,kmer,mod_ratio ENST00000451850.6,92,22,0.8810425,AGACT,0.45454545454545453 ENST00000451850.6,97,21,0.96233165,GAACT,0.6666666666666666 ENST00000451850.6,220,24,0.36171895,GAACC,0.08333333333333333 ENST00000451850.6,226,24,0.65948063,GAACT,0.25 ENST00000451850.6,235,23,0.023516918,TAACA,0.0 ENST00000451850.6,246,22,0.22028852,GAACA,0.09090909090909091 ENST00000451850.6,482,27,0.5908255,TGACT,0.3333333333333333 ENST00000451850.6,544,29,0.017148245,AAACC,0.0 ENST00000451850.6,572,30,0.2194786,TGACC,0.1333333333333 3333

iOS head of data.result.csv:

transcript_id,transcript_position,n_reads,probability_modified,kmer,mod_ratio ENST00000451850.6,92,22,0.8829821,AGACT,0.45454545454545453 ENST00000451850.6,97,21,0.9631523,GAACT,0.6666666666666666 ENST00000451850.6,220,24,0.3354525,GAACC,0.08333333333333333 ENST00000451850.6,226,24,0.67357624,GAACT,0.25 ENST00000451850.6,235,23,0.022462333,TAACA,0.0 ENST00000451850.6,246,22,0.20326284,GAACA,0.09090909090909091 ENST00000451850.6,482,27,0.6155447,TGACT,0.3333333333333333 ENST00000451850.6,544,29,0.027465964,AAACC,0.0 ENST00000451850.6,572,30,0.21144116,TGACC,0.13333333333333333

Both in the laptop and cluster, conda list | grep "m6anet" returns the same result:

m6anet 1.1.0 pypi_0 pypi

Theses differences are probably non significant, but it's something to be aware of.

GoekeLab / m6anet

Inference. RuntimeError: Too many open files #38