Open malonematt opened 2 years ago
From what I can tell this might be happening because on line 1521 qual is set to read.qual:
`for read in bam.fetch(cluster.chrom(), out_start, out_end): if not read.is_secondary and not read.is_supplementary: seq = read.seq qual = read.qual
if read.is_reverse:
seq = rc(seq)
qual = qual[::-1]`
The only other mention of read.qual is on line 1650, when ins_read is being defined:
ins_read = InsRead(bam.filename.decode(), read.reference_name, q_start, q_end, r_start, r_end, read.qname, read.seq, read.qual, read.mapq, is_ins, is_clip, clip_end, phase)
I'm still very new to python, is this issue caused because read.qual has not been defined yet?
I checked my bam with samtools view file.bam, and they have quality scores (or at least some do, I havent checked if every aligned read does yet). Could it be that some of the reads don't have quality scores?
Hi, sorry for the delay. It's likely one or more read alignment records are misssing quality scores (and sequences) as I've seen this come up in other software with minimap2 .bams.
I've pushed a fix that will skip the offending alignments at that point and complain about it a bit so you can track down the read if you like: ae3cdb8
It's possible you'll hit this elsewhere in the code though so let me know if it comes up again.
Regarding your q about read.qual
, that's set by pysam
when the read is parsed into their AlignedSegment
class (if I have the name right).
I just ran it after double-checking that the code was updated with your fix and it still threw the same error:
2022-05-25 18:14:48,448 loaded 504 clusters from results/OF1.tldr/CM025008.1.pickle multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home1/malonema/.local/bin/tldr", line 1525, in process_cluster qual = qual[::-1] TypeError: 'NoneType' object is not subscriptable """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home1/malonema/.local/bin/tldr", line 2128, in
So I guess it wasn't that problem?
Although, now that I'm going through it, that qual = qual[[::-1]] line isn't on line 1525 in the updated code .. it's on 1529.
So maybe it's just still running the old tldr.
Yup, the one in my actual conda directory didn't update. Which I find confusing since I did a fresh install. But I'll just update it by hand. I'll let you know if this fixes the problem.
So that fix was able to resolve that error, but now it generates a different error:
2022-05-26 02:48:27,604 skipped a read without seq/qual: ed5106b4-53fa-4f23-85ed-f1720a965b0d 2022-05-26 02:48:27,604 skipped a read without seq/qual: dbfb5709-60e4-4cbd-abfc-0971901cbcaf multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home1/malonema/.local/bin/tldr", line 1536, in process_cluster cluster.spanning_non_supporting_reads(int(args.wiggle), int(args.min_te_len)) File "/home1/malonema/.local/bin/tldr", line 399, in spanning_non_supporting_reads for r in bam.fetch(self.chrom(), te_ins_start, te_ins_end): File "pysam/libcalignmentfile.pyx", line 1091, in pysam.libcalignmentfile.AlignmentFile.fetch File "pysam/libchtslib.pyx", line 690, in pysam.libchtslib.HTSFile.parse_region ValueError: start out of range (-271) """
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home1/malonema/.local/bin/tldr", line 2132, in
It looks like these are unrelated, so let me know if you'd like me to open a separate issue for it.
I found a somewhat similar issue here: https://github.com/ComputationalSystemsBiology/ExoProfiler/issues/6
In this one it's that a region near the start of a chromosome is extended beyond the start, resulting in a negative value. Which upsets pysam.
Do you think --extend_consensus could be causing the same problem?
Looks like you ran into this problem in: https://github.com/adamewing/tldr/issues/8 I'll go look at your fix to see if that helps me figure things out
I did try another run without --extend_consensus (but still with --detail_output) and it threw the same error
I think what's happening is in this section of code ` for bampath in set([read.bampath for read in self.reads if read.useable]): bamname = '.'.join(os.path.basename(bampath).split('.')[:-1]) bam = pysam.AlignmentFile(bampath)
te_ins_start = int(self.breakpoints[0])
te_ins_end = int(self.breakpoints[1])
for r in bam.fetch(self.chrom(), te_ins_start, te_ins_end):
if r.is_secondary or r.is_supplementary:
continue
` Where I think te_ins_start is getting assigned that negative number. Just a guess.
Hi Adam,
Thanks for all of the help you've given me using your software.
I ran into the following error:
2022-05-24 14:49:35,184 tldr started with command: /home1/malonema/.local/bin/tldr -b bams/OF1_sorted_mappings.bam -r resources/Masked_Genome_061021.fa -e none -p 20 -o results/OF1.tldr --detail_output --extend_consensus 2000 2022-05-24 14:49:35,184 output basename: results/OF1.tldr 2022-05-24 14:49:35,636 "None" passed to -e/--elts, running without TE reference 2022-05-24 14:49:36,409 writing clusters to results/OF1.tldr/JAAVVJ010000099.1.pickle 2022-05-24 14:49:37,158 writing clusters to results/OF1.tldr/JAAVVJ010009971.1.pickle 2022-05-24 14:49:39,252 writing clusters to results/OF1.tldr/CM025019.1.pickle ... 2022-05-24 14:52:07,881 writing clusters to results/OF1.tldr/JAAVVJ010009963.1.pickle 2022-05-24 14:52:08,399 loaded 504 clusters from results/OF1.tldr/CM025008.1.pickle multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) File "/home1/malonema/.local/bin/tldr", line 1525, in process_cluster qual = qual[::-1] TypeError: 'NoneType' object is not subscriptable """
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home1/malonema/.local/bin/tldr", line 2128, in
main(args)
File "/home1/malonema/.local/bin/tldr", line 1907, in main
processed_clusters.append(res.get())
File "/spack/apps2/linux-centos7-x86_64/gcc-11.2.0/python-3.9.6-5amy32qig2nbj7ti7ehht3y2vbmdc2j7/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
TypeError: 'NoneType' object is not subscriptable
Any idea what might be causing this?