cancerit / telomerecat

Telomerecat: The telomere computational analysis tool
GNU General Public License v3.0
20 stars 5 forks source link

Could not retrieve index file #20

Closed radwaraed closed 3 years ago

radwaraed commented 4 years ago

Hi, I am trying to use telomerecat on a bamfile (whose index is in the same path), yet I am getting lines & lines of "Could not retrieve index file", followed by "telomerecat stopped unexpecedtly". ValueError: file does not contain alignment data

I would also recommend to fix the typo: unexpecedtly > unexpectedly :) but that is minor

vincent-hanlon commented 3 years ago

I'm getting a similar error. Many lines of variations on this error:

[E::idx_find_and_load] Could not retrieve index file for '/tmp/telomerecat_bam2length-dljprax2/37240_16_1_MMb8vb7_.bam'

And then eventually this one:

[E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes

And finally this one:

Process Task-10:2: Traceback (most recent call last): File "/home/vhanlon/miniconda2/envs/py3/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap self.run() File "parabam/core.pyx", line 291, in parabam.core.Task.run File "parabam/core.pyx", line 311, in parabam.core.Task.__generate_results__ File "parabam/command/core.pyx", line 50, in parabam.command.core.Task.__process_task_set__ File "pysam/libcalignmentfile.pyx", line 2187, in pysam.libcalignmentfile.IteratorRowAll.__next__ OSError: truncated file

I've tried running with BAM files from a variety of relative and absolute paths, also setting a different tmp directory using --temp_dir. I've also tried running this on two different servers, but with the same result.

duran72 commented 3 years ago

Hi, I am also seeing the same issue.

My files are lossless bam files using novaseq6000

My error file for the bam2telbam step shows these lines for 25,000 lines: [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/tmp_telbams_p3/telomerecat_bam2telbam-b306o1wu/chaser_31801_0_115302.bam' E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/tmp_telbams_p3/telomerecat_bam2telbam-b306o1wu/31792_1_0XX0v0-0.bam' [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/tmp_telbams_p3/telomerecat_bam2telbam-b306o1wu/31792_2_0MMb14vb7-0.bam'

I still created a telbam file of size ~6MB

My error file for the telbam2length step shows these lines: [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115425_telbam.bam' [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes Process Task-3:1: Traceback (most recent call last): File "/home/p/psb7/miniconda3/envs/Telomere/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "parabam/core.pyx", line 291, in parabam.core.Task.run File "parabam/core.pyx", line 311, in parabam.core.Task.__generate_results File "parabam/command/core.pyx", line 50, in parabam.command.core.Task.process_task_set File "pysam/libcalignmentfile.pyx", line 2187, in pysam.libcalignmentfile.IteratorRowAll.next__ OSError: truncated file

still waiting to see if I get a length estimate, wall time so far taking more than 4h, and when using python2 it took less than 30 minutes

keiranmraine commented 3 years ago

This is a warning from pysam. There is a workaround but it has no impact on the processing:

https://github.com/pysam-developers/pysam/issues/939

duran72 commented 3 years ago

Hi Keiran, I agree with you I did google the error and see that pysam is giving a warning too. However, although I am able to generate a telbam file using bam2telbam, I'm unable to generate a telbam2length output, not sure what I'm doing wrong: I'm running this script: /home/miniconda3/envs/Telomere/bin/telomerecat telbam2length -v2 /scratch/telbams_output_p3/115_telbam.bam --temp_dir /scratch/wtccc/psb7/tmp_length_p3 --output /scratch/telbam_to_length_output_p3/p3_length_est_115.csv Many thanks

duran72 commented 3 years ago

Hi Keiran,

I re-sorted my bam file again using samtools -o 115234_sorted.bam 115234.bam and re-indexed the resulting *_sorted.bam, in case there was something wrong with my lossless bam file.

I get a 115234_sorted_telbam.bam of size 11.2GB

However, the telbam to length step seems to be left hanging, and will not generate a result, even after 12h, when I used python2 method this took about 30 minutes:

/etc/profile.d/hpc-login.sh: line 37: export: `234': not a valid identifier [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes [E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes[E::idx_find_and_load] Could not retrieve index file for '/scratch/wtccc/psb7/telbams_output_p3/115234_sorted_telbam.bam' [E::bgzf_read] Read block operation failed with error 2 after 0 of 4 bytes Process Task-3:2: Traceback (most recent call last): File "/home/p/psb7/miniconda3/envs/Telomere/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "parabam/core.pyx", line 291, in parabam.core.Task.run File "parabam/core.pyx", line 311, in parabam.core.Task.__generate_results File "parabam/command/core.pyx", line 50, in parabam.command.core.Task.process_task_set File "pysam/libcalignmentfile.pyx", line 2187, in pysam.libcalignmentfile.IteratorRowAll.next__ OSError: truncated file =>> PBS: job killed: walltime 43206 exceeded limit 43200

keiranmraine commented 3 years ago

@duran72 I believe your problem is unrelated to the original intent of this issue, however do you have sufficient available disk space? The error specifically stated that the file being read has been truncated, it's not clear but I suspect this is the intermediate file, so this would be pointing at the --temp_dir area space being exhausted.

This may not be the cause, it's difficult to determine when the process is killed. Please note that the original developer of this tool is not longer providing the support, we are attempting to improve the code and a small hotfix will be released very soon, however I don't think it will affect the issue above.

duran72 commented 3 years ago

Hi Keiran,

Thanks for attempting to address my problem, disc space shouldn't be a problem as it is all being run on our University Scratch area that has plenty of space. I managed to get results when I used python2, but I found when I re-ran the telbam to length script that I would get a different length result each time, and many of my samples gave results that showed the telomere lengths were too short, which didn't compare favourably with what I found using telseq and computel, so I thought I'd use the newer version python3 to see what results that gave.

keiranmraine commented 3 years ago

As I understand it to get the same answer each time you need to set --seed_randomness.

Additionally specifying -t 75 is known to give more consistent and comparable results to other tools such as telseq... this is knowledge I've picked up from our scientific staff this week.

keiranmraine commented 3 years ago

The typos and docs updates have been committed. The remaining part of this conversation has been moved to a new issue