cancerit / telomerecat

Telomerecat: The telomere computational analysis tool
GNU General Public License v3.0
20 stars 5 forks source link

StopIteration - pysam_collate/bam2telbam etc #31

Open keiranmraine opened 3 years ago

keiranmraine commented 3 years ago

BAM/CRAM files where read1/read2 are not fully paired will cause an error that ends:

  File "/.../telomerecat/telbam.py", line 64, in pairs_to_telbam
    read_b = next(read_iter)
  File "pysam/libcalignmentfile.pyx", line 2189, in pysam.libcalignmentfile.IteratorRowAll.__next__
StopIteration

It is highly likely that your source data file is corrupt, check for a mismatch in r1/r2 primary alignments:

samtools view -c -@ 2 -F 2304 -f 64 input.bam
samtools view -c -@ 2 -F 2304 -f 128 input.bam

Will attempt to catch and add a more informative error, but for now documenting here.

keiranmraine commented 3 years ago

May have to extend this with readname verification between reads to stop when reads become unpaired due to how fast collate occurs (switch to error or allow).

ndeimler99 commented 2 years ago

Hello, I am running into the same issue; however while using bam2length. When I run the samtools view command to search for a mismatch in primary alignments, the same number is returned. I was wondering if you had made any further progress on this error (attached below). command was simply /home/ndeimler/telomerecat/bin/telomerecat bam2length /mnt/mogon_scratch_nd/nd/TERC/MUTANT/raw_align.bam

[Error] telomerecat stopped unexpectedly, sorry! Traceback (most recent call last): File "parabam/core.pyx", line 61, in parabam.core.CmdLineInterface.handle File "/home/ndeimler/telomerecat/lib/python3.7/site-packages/telomerecat/bam2length.py", line 49, in run_cmd seed_randomness=self.cmd_args.seed_randomness File "/home/ndeimler/telomerecat/lib/python3.7/site-packages/telomerecat/bam2length.py", line 78, in run out_files = telbam_interface.run(input_paths=input_paths, outbam_dir=outbam_dir) File "/home/ndeimler/telomerecat/lib/python3.7/site-packages/telomerecat/bam2telbam.py", line 101, in run telbam_paths = telbam.process_alignments(outbam_dir, self.total_procs, self.temp_dir, input_paths, reference=reference, verbose=self.verbose) File "/home/ndeimler/telomerecat/lib/python3.7/site-packages/telomerecat/telbam.py", line 125, in process_alignments telbam_path = to_telbam(xam_file, outbam_dir=outbam_dir, tmpdir=tmpdir, hts_processes=hts_processes, reference=reference) File "/home/ndeimler/telomerecat/lib/python3.7/site-packages/telomerecat/telbam.py", line 89, in to_telbam collate_proc = subprocess.Popen(collate_wrap) File "/home/ndeimler/miniconda3/lib/python3.7/subprocess.py", line 800, in init restore_signals, start_new_session) File "/home/ndeimler/miniconda3/lib/python3.7/subprocess.py", line 1551, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) FileNotFoundError: [Errno 2] No such file or directory: 'pysam_collate': 'pysam_collate'

keiranmraine commented 2 years ago

Can you run the following, this will check to see if readnames go out of sync:

cmp <(samtools view -@ 2 -F 2304 -f 64 input.bam | cut -f 1) <(samtools view -@ 2 -F 2304 -f 128 input.bam | cut -f 1)
echo $?

cmp exit code (echo $?) meanings:

  1. Read names are matched throughout file
  2. Read names go out of sync
  3. Other problem during processing
ndeimler99 commented 2 years ago

Due to memory issues and the size of my input bam I was unable to run it as one command. I ran each samtools view command independently and stored the results in a .txt file, then used cmp to compare the text files. It returns an exit status of 0.

Edit: I am using Raw reads here, not trimmed, so there are likely to be unpaired read pairs in which one aligns while the other does not in the bam file. Does bam2length account for this?

Edit: I uninstalled and reinstalled telomerecat and it seems to be working now. Sorry for bothering