MrOlm / inStrain

Bioinformatics program inStrain
MIT License
137 stars 33 forks source link

instrain compare --bams for pooled_SNV #187

Open Lyylsys opened 3 months ago

Lyylsys commented 3 months ago

When I use the instrain compare -bams parameter, I consistently encounter the following issue:‘’ “ValueError: error during iteration BlockingIOError: [Errno 11] Resource temporarily unavailable Exception ignored in: 'pysam.libcalignmentfile.AlignmentFile.dealloc' BlockingIOError: [Errno 11] Resource temporarily unavailable”

I have already tried increasing the number of threads, but since this step cannot be processed incrementally, it takes a significant amount of time. Moreover, it often fails after running for almost half of the duration, causing me to repeat the process several times. This has been very frustrating.

I'm currently at a loss about the cause of this problem. Is there a faster method to achieve the desired results? Alternatively, if I avoid using the -bams parameter and stick to the regular -i and -o parameters with instrain compare, how can I determine the SNP distribution for every position in each scaffold across all samples?

the detail issue as followed:

Pulling 36762107 SNVs from 238626 scaffolds. This should take ~ 3977.1 min in total Pulling SNVs from BAMs: 43%|████▎ | 103540/238626 [176:57:43<273:15:49, 7.28s/it][E::bgzf_uncompress] Inflate operation failed: 1 [E::bgzf_read] Read block operation failed with error 1 after 0 of 4 bytes Traceback (most recent call last): File "/mypath/instrain/bin/inStrain", line 31, in inStrain.controller.Controller().main(args) File "/mypath/instrain/lib/python3.8/site-packages/inStrain/controller.py", line 57, in main self.compare_operation(args) File "/mypath/instrain/lib/python3.8/site-packages/inStrain/controller.py", line 89, in compare_operation inStrain.compare_controller.CompareController(args).main() File "/mypath/instrain/lib/python3.8/site-packages/inStrain/compare_controller.py", line 73, in main self.run_auxillary_processing() File "/mypath/instrain/lib/python3.8/site-packages/inStrain/compare_controller.py", line 298, in run_auxillary_processing PM.main() File "/mypath/instrain/lib/python3.8/site-packages/inStrain/polymorpher.py", line 93, in main self.pull_SNVS_from_bams() File "/mypathinstrain/lib/python3.8/site-packages/inStrain/polymorpher.py", line 121, in pull_SNVS_from_bams scaff2name2position2counts[scaff][name] = extract_SNVS_from_bam(bam_loc, Rdic, locs, scaff) File "/mypath/instrain/lib/python3.8/site-packages/inStrain/polymorpher.py", line 300, in extract_SNVS_from_bam for pilecol in biter: File "pysam/libcalignmentfile.pyx", line 2733, in pysam.libcalignmentfile.IteratorColumnRegion.next ValueError: error during iteration BlockingIOError: [Errno 11] Resource temporarily unavailable Exception ignored in: 'pysam.libcalignmentfile.AlignmentFile.dealloc' BlockingIOError: [Errno 11] Resource temporarily unavailable

MrOlm commented 3 months ago

Hi @Lyylsys -

How frustrating. This only happens with the -bams parameter, and not otherwise?

Unfortunately this seems to be an issue with pysam interacting with your operating system, and not something I have a lot of control over how to fix. I'd recommend trying to update pysam and/or switch to python 3.9, which might fix whatever bug this is.

The SNVs.tsv file (https://instrain.readthedocs.io/en/latest/example_output.html#snvs-tsv) lists the location of all SNVs in each sample without the need to provide .bams, if that is sufficient for you.

Best, Matt

Lyylsys commented 3 months ago

Hi @Matt, However, I previously processed 16 IS files successfully using the -bams parameter in the same environment. This time, I encountered the above errors while testing with 32 IS files. I plan to compare 500 samples in the future, which will make this step very time-consuming and prone to errors. I aim to get the distribution of all samples at each position on each scaffold, rather than individual sample distributions. Therefore, I believe this step is necessary for summarization. However, I am currently unsure how to resolve the repeated "ValueError: error during iteration BlockingIOError: [Errno 11] Resource temporarily unavailable Exception ignored in: 'pysam.libcalignmentfile.AlignmentFile.dealloc' BlockingIOError: [Errno 11] Resource temporarily unavailable" issue.

MrOlm commented 3 months ago

Hi @Lyylsys ,

I wish I had better suggestions, but my only advice would be to update python and/or pysam, as those are the programs responsible for this error.

Apologies, Matt

Lyylsys commented 3 months ago

Hi @matt, I will have a try. Thanks for your help.

Best, lyylsys

Lyylsys commented 3 months ago

Hi @matt,

As per your suggestion, I have created a new environment, updated Python, and PySAM, and installed the latest version of inStrain. However, after running the process for five days, I encountered a different error:

Pulling SNVs from BAMs: 21%|██ | 49444/238626 [92:14:15<421:43:12, 8.03s/it]Traceback (most recent call last): File "/mypath/instrain_py/bin/inStrain", line 31, in inStrain.controller.Controller().main(args) File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/controller.py", line 57, in main self.compare_operation(args) File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/controller.py", line 89, in compare_operation inStrain.compare_controller.CompareController(args).main() File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/compare_controller.py", line 74, in main self.run_auxillary_processing() File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/compare_controller.py", line 299, in run_auxillary_processing PM.main() File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/polymorpher.py", line 93, in main self.pull_SNVS_from_bams() File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/polymorpher.py", line 115, in pull_SNVS_from_bams Rdic = self.name2isp[name].get("Rdic") File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/SNVprofile.py", line 147, in get return self._load_pickle(filename) File "/mypath/instrain_py/lib/python3.10/site-packages/inStrain/SNVprofile.py", line 840, in _load_pickle tmp_dict = pickle.load(f) _pickle.UnpicklingError: invalid load key, ':'. Pulling SNVs from BAMs: 21%|██ | 49444/238626 [92:14:33<352:56:14, 6.72s/it]