adamewing / tldr

Identify and annotate TE-mediated insertions in long-read sequence data
MIT License
40 stars 4 forks source link

TypeError #6

Closed gbdias closed 3 years ago

gbdias commented 3 years ago
(tldr) gd98309@d3-12 tldr$ python /home/gd98309/tldr/tldr/tldr -b /scratch/gd98309/S2/telr_307172/A4/intermediate_files/A4_sort.bam -e /home/gd98309/transposons/current/D_mel_transposon_sequence_set.fa -r /scratch/gd98309/dm6/dm6_chr.fa --color_consensus -p 20 -c chrs.txt
2020-11-18 00:04:44,242 te-ont started with command: /home/gd98309/tldr/tldr/tldr -b /scratch/gd98309/S2/telr_307172/A4/intermediate_files/A4_sort.bam -e /home/gd98309/transposons/current/D_mel_transposon_sequence_set.fa -r /scratch/gd98309/dm6/dm6_chr.fa --color_consensus -p 20 -c chrs.txt
2020-11-18 00:04:44,242 output basename: A4_sort
2020-11-18 00:14:22,545 writing clusters to A4_sort/chr2L.pickle
2020-11-18 00:14:26,812 loaded 51153 clusters from A4_sort/chr2L.pickle
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/gd98309/miniconda/envs/tldr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/gd98309/tldr/tldr/tldr", line 1129, in process_cluster
    cluster.trim_reads(int(args.flanksize))
  File "/home/gd98309/tldr/tldr/tldr", line 351, in trim_reads
    read.trim(flanksize=flanksize)
  File "/home/gd98309/tldr/tldr/tldr", line 214, in trim
    self.r_qual_trimmed = self.r_qual[trim_start:trim_end]
TypeError: 'NoneType' object is not subscriptable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gd98309/tldr/tldr/tldr", line 1805, in <module>
    main(args)
  File "/home/gd98309/tldr/tldr/tldr", line 1645, in main
    processed_clusters.append(res.get())
  File "/home/gd98309/miniconda/envs/tldr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
TypeError: 'NoneType' object is not subscriptable
adamewing commented 3 years ago

Thanks for being an early adopter. Unfortunately that means you get to help get the bugs out as it hasn't been tested with NGMLR .bams at all. Do your reads have quality scores present in the .bam file? If it's a public dataset you can point me at I'd be happy to try it out. Also, a word of caution about PacBio + TLDR: TLDR requires at least one read to have the fully embedded TE sequence. TLDR + CCS/HiFi reads show reduced sensitivity for long TE insertions as a result. Performance with Non-CCS/HiFi PacBio reads (i.e. subreads) is not great. If you haven't had a look at PALMER as well that might be a better fit: https://github.com/mills-lab/PALMER

gbdias commented 3 years ago

Hi @adamewing,

adamewing commented 3 years ago

OK, thanks. This hasn't been tested with CLR reads no no advice either way yet. TLDR does expect quality scores at present, might be possible to get around if it ends up being a frequent use case.

adamewing commented 3 years ago

.bams w/o quality score suppported in 9067882