I wanted to compute the coverage of a paired bam file, and got an error about fragment start being greater than fragment end. After some investigation, I realized the problem came from the defaultFragmentLength being 0, causing unproperly aligned reads to have identical fragment start and end. The problem can be easily solved by supplying a value to extendReads.
However I believe it would be better if the program throws an error as soon as this problem is encountered, it would be easier to spot and it would save time, because in my case bamCoverage kept running for awhile despite showing the error and eventually crashed.
I would also add that the reason the defaultFragmentLength was 0, is because the estimated median was 0. I scanned all the pairs in my file and had only 7% of templates lengths of 0. However the get_read_and_fragment_length function from deeptools showed up to the qtile60 as 0. This could be due to the reference genome I used (YPH499) which is not assembled to the chromosome level and has many very small scaffolds
Command that produces the issue:
bamCoverage -p 16 -b NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam -o NG-34039_167_4_2_lib713577_10294_1_101bp_max250.bw --binSize 1 --extendReads
Output:
bamFilesList: ['NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam']
binLength: 1
numberOfSamples: None
blackListFileName: None
skipZeroOverZero: False
bed_and_bin: False
genomeChunkSize: None
defaultFragmentLength: 0
numberOfProcessors: 16
verbose: False
region: 167_7_4kbrf:1:9169:1
bedFile: None
minMappingQuality: None
ignoreDuplicates: False
chrsToSkip: []
stepSize: 1
center_read: False
samFlag_include: None
samFlag_exclude: None
minFragmentLength: 0
maxFragmentLength: 0
zerosToNans: False
smoothLength: None
save_data: False
out_file_for_raw_data: None
maxPairedFragmentLength: 0
Traceback (most recent call last):
File "/home/alex/anaconda3/envs/tf2.5/bin/bamCoverage", line 12, in
main(args)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/bamCoverage.py", line 256, in main
wr.run(writeBedGraph.scaleCoverage, func_args, args.outFileName,
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 145, in run
res = mapReduce.mapReduce([func_to_call, func_args],
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/mapReduce.py", line 146, in mapReduce
res = list(map(func, TASKS))
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 27, in writeBedGraph_wrapper
return WriteBedGraph.writeBedGraph_worker(*args)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 231, in writeBedGraphworker
coverage, = self.count_reads_in_region(chrom, start, end)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 501, in count_reads_in_region
tcov = self.get_coverage_of_region(bam, chrom, trans)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 674, in get_coverage_of_region
position_blocks = fragmentFromRead_func(read)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 902, in get_fragment_from_read
assert fragmentStart < fragmentEnd, "fragment start greater than fragment" \
AssertionError: fragment start greater than fragmentend for read A01619:208:H7K7CDSX5:1:2148:11731:2988
And also python code to compute proportion of template lengths of 0:
bam_file = 'NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam'
with pysam.AlignmentFile(bam_file, 'rb') as f:
\t tlens = []
\t for read in f.fetch():
\t\t if read.is_read1:
\t\t\t tlens.append(abs(read.template_length))
print(np.sum(np.array(tlens) <= 0) / len(tlens))
And python code to get fragment length from the deeptools function:
from deeptools.getFragmentAndReadSize import get_read_and_fragment_length
print(get_read_and_fragment_length(bam_file, numberOfProcessors=1))
python version 3.8 deepTools version 3.5.1
I wanted to compute the coverage of a paired bam file, and got an error about fragment start being greater than fragment end. After some investigation, I realized the problem came from the defaultFragmentLength being 0, causing unproperly aligned reads to have identical fragment start and end. The problem can be easily solved by supplying a value to extendReads. However I believe it would be better if the program throws an error as soon as this problem is encountered, it would be easier to spot and it would save time, because in my case bamCoverage kept running for awhile despite showing the error and eventually crashed. I would also add that the reason the defaultFragmentLength was 0, is because the estimated median was 0. I scanned all the pairs in my file and had only 7% of templates lengths of 0. However the get_read_and_fragment_length function from deeptools showed up to the qtile60 as 0. This could be due to the reference genome I used (YPH499) which is not assembled to the chromosome level and has many very small scaffolds
Command that produces the issue:
bamCoverage -p 16 -b NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam -o NG-34039_167_4_2_lib713577_10294_1_101bp_max250.bw --binSize 1 --extendReads
Output: bamFilesList: ['NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam'] binLength: 1 numberOfSamples: None blackListFileName: None skipZeroOverZero: False bed_and_bin: False genomeChunkSize: None defaultFragmentLength: 0 numberOfProcessors: 16 verbose: False region: 167_7_4kbrf:1:9169:1 bedFile: None minMappingQuality: None ignoreDuplicates: False chrsToSkip: [] stepSize: 1 center_read: False samFlag_include: None samFlag_exclude: None minFragmentLength: 0 maxFragmentLength: 0 zerosToNans: False smoothLength: None save_data: False out_file_for_raw_data: None maxPairedFragmentLength: 0 Traceback (most recent call last): File "/home/alex/anaconda3/envs/tf2.5/bin/bamCoverage", line 12, in
main(args)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/bamCoverage.py", line 256, in main
wr.run(writeBedGraph.scaleCoverage, func_args, args.outFileName,
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 145, in run
res = mapReduce.mapReduce([func_to_call, func_args],
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/mapReduce.py", line 146, in mapReduce
res = list(map(func, TASKS))
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 27, in writeBedGraph_wrapper
return WriteBedGraph.writeBedGraph_worker(*args)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/writeBedGraph.py", line 231, in writeBedGraphworker
coverage, = self.count_reads_in_region(chrom, start, end)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 501, in count_reads_in_region
tcov = self.get_coverage_of_region(bam, chrom, trans)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 674, in get_coverage_of_region
position_blocks = fragmentFromRead_func(read)
File "/home/alex/anaconda3/envs/tf2.5/lib/python3.8/site-packages/deeptools/countReadsPerBin.py", line 902, in get_fragment_from_read
assert fragmentStart < fragmentEnd, "fragment start greater than fragment" \
AssertionError: fragment start greater than fragmentend for read A01619:208:H7K7CDSX5:1:2148:11731:2988
And also python code to compute proportion of template lengths of 0: bam_file = 'NG-34039_167_4_2_lib713577_10294_1_101bp_max250.sorted.bam' with pysam.AlignmentFile(bam_file, 'rb') as f: \t tlens = [] \t for read in f.fetch(): \t\t if read.is_read1: \t\t\t tlens.append(abs(read.template_length)) print(np.sum(np.array(tlens) <= 0) / len(tlens))
And python code to get fragment length from the deeptools function: from deeptools.getFragmentAndReadSize import get_read_and_fragment_length print(get_read_and_fragment_length(bam_file, numberOfProcessors=1))