TypeError - Githubissues

gbdias commented 3 years ago

Hi, I'm trying to use TLDR with a Drosophila PacBio dataset aligned to the reference genome using NGMLR.
I got the following error and I'm not sure how to proceed:

(tldr) gd98309@d3-12 tldr$ python /home/gd98309/tldr/tldr/tldr -b /scratch/gd98309/S2/telr_307172/A4/intermediate_files/A4_sort.bam -e /home/gd98309/transposons/current/D_mel_transposon_sequence_set.fa -r /scratch/gd98309/dm6/dm6_chr.fa --color_consensus -p 20 -c chrs.txt
2020-11-18 00:04:44,242 te-ont started with command: /home/gd98309/tldr/tldr/tldr -b /scratch/gd98309/S2/telr_307172/A4/intermediate_files/A4_sort.bam -e /home/gd98309/transposons/current/D_mel_transposon_sequence_set.fa -r /scratch/gd98309/dm6/dm6_chr.fa --color_consensus -p 20 -c chrs.txt
2020-11-18 00:04:44,242 output basename: A4_sort
2020-11-18 00:14:22,545 writing clusters to A4_sort/chr2L.pickle
2020-11-18 00:14:26,812 loaded 51153 clusters from A4_sort/chr2L.pickle
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/home/gd98309/miniconda/envs/tldr/lib/python3.9/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "/home/gd98309/tldr/tldr/tldr", line 1129, in process_cluster
    cluster.trim_reads(int(args.flanksize))
  File "/home/gd98309/tldr/tldr/tldr", line 351, in trim_reads
    read.trim(flanksize=flanksize)
  File "/home/gd98309/tldr/tldr/tldr", line 214, in trim
    self.r_qual_trimmed = self.r_qual[trim_start:trim_end]
TypeError: 'NoneType' object is not subscriptable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gd98309/tldr/tldr/tldr", line 1805, in <module>
    main(args)
  File "/home/gd98309/tldr/tldr/tldr", line 1645, in main
    processed_clusters.append(res.get())
  File "/home/gd98309/miniconda/envs/tldr/lib/python3.9/multiprocessing/pool.py", line 771, in get
    raise self._value
TypeError: 'NoneType' object is not subscriptable

adamewing commented 3 years ago

Thanks for being an early adopter. Unfortunately that means you get to help get the bugs out as it hasn't been tested with NGMLR .bams at all. Do your reads have quality scores present in the .bam file? If it's a public dataset you can point me at I'd be happy to try it out. Also, a word of caution about PacBio + TLDR: TLDR requires at least one read to have the fully embedded TE sequence. TLDR + CCS/HiFi reads show reduced sensitivity for long TE insertions as a result. Performance with Non-CCS/HiFi PacBio reads (i.e. subreads) is not great. If you haven't had a look at PALMER as well that might be a better fit: https://github.com/mills-lab/PALMER

gbdias commented 3 years ago

Hi @adamewing,

Thanks for the tips. The dataset I'm using for testing is publicly available at SRR7874275.
This is a PacBio CLR dataset, not CCS. So, do you advise against using this type of data with TLDR?
I've downloaded the reads in fasta format so the BAM file does not have quality scores. Are quality scores necessary for TLDR?
I'll take a look at PALMER too, thanks for sharing this!

adamewing commented 3 years ago

OK, thanks. This hasn't been tested with CLR reads no no advice either way yet. TLDR does expect quality scores at present, might be possible to get around if it ends up being a frequent use case.

adamewing commented 3 years ago

.bams w/o quality score suppported in 9067882

adamewing / tldr

TypeError #6