deeptools / deepTools

Tools to process and analyze deep sequencing data.
Other
685 stars 213 forks source link

bamCoverage --extendReads, defaultFragmentLength can be 0 causing program to fail #1249

Open Alexwestbrook opened 1 year ago

Alexwestbrook commented 1 year ago

python version 3.8 deepTools version 3.5.1

I wanted to compute the coverage of a paired bam file, and got an error about fragment start being greater than fragment end. After some investigation, I realized the problem came from the defaultFragmentLength being 0, causing unproperly aligned reads to have identical fragment start and end. The problem can be easily solved by supplying a value to extendReads. However I believe it would be better if the program throws an error as soon as this problem is encountered, it would be easier to spot and it would save time, because in my case bamCoverage kept running for awhile despite showing the error and eventually crashed. I would also add that the reason the defaultFragmentLength was 0, is because the estimated median was 0. I scanned all the pairs in my file and had only 7% of templates lengths of 0. However the get_read_and_fragment_length function from deeptools showed up to the qtile60 as 0. This could be due to the reference genome I used (YPH499) which is not assembled to the chromosome level and has many very small scaffolds

Command that produces the issue: bamCoverage -p 16 -b NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam -o NG-34039_167_4_2_lib713577_10294_1_101bp_max250.bw --binSize 1 --extendReads

Output: bamFilesList: ['NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam'] binLength: 1 numberOfSamples: None blackListFileName: None skipZeroOverZero: False bed_and_bin: False genomeChunkSize: None defaultFragmentLength: 0 numberOfProcessors: 16 verbose: False region: 167_7_4kbrf:1:9169:1 bedFile: None minMappingQuality: None ignoreDuplicates: False chrsToSkip: [] stepSize: 1 center_read: False samFlag_include: None samFlag_exclude: None minFragmentLength: 0 maxFragmentLength: 0 zerosToNans: False smoothLength: None save_data: False out_file_for_raw_data: None maxPairedFragmentLength: 0 Traceback (most recent call last): File "/home/alex/anaconda3/envs/tf2.5/bin/bamCoverage", line 12, in main(args) File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/bamCoverage.py", line 256, in main wr.run(writeBedGraph.scaleCoverage, func_args, args.outFileName, File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 145, in run res = mapReduce.mapReduce([func_to_call, func_args], File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/mapReduce.py", line 146, in mapReduce res = list(map(func, TASKS)) File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 27, in writeBedGraph_wrapper return WriteBedGraph.writeBedGraph_worker(*args) File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 231, in writeBedGraphworker coverage, = self.count_reads_in_region(chrom, start, end) File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 501, in count_reads_in_region tcov = self.get_coverage_of_region(bam, chrom, trans) File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 674, in get_coverage_of_region position_blocks = fragmentFromRead_func(read) File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 902, in get_fragment_from_read assert fragmentStart < fragmentEnd, "fragment start greater than fragment" \ AssertionError: fragment start greater than fragmentend for read A01619:208:H7K7CDSX5:1:2148:11731:2988

And also python code to compute proportion of template lengths of 0: bam_file = 'NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam' with pysam.AlignmentFile(bam_file, 'rb') as f: \t tlens = [] \t for read in f.fetch(): \t\t if read.is_read1: \t\t\t tlens.append(abs(read.template_length)) print(np.sum(np.array(tlens) <= 0) / len(tlens))

And python code to get fragment length from the deeptools function: from deeptools.getFragmentAndReadSize import get_read_and_fragment_length print(get_read_and_fragment_length(bam_file, numberOfProcessors=1))