Closed chtsai0105 closed 2 years ago
Hello @chtsai0105 thank you for reaching out! You are quite right, this is not the intended behavior.
It appears you have found a bug! π π π
I have removed the df.depth != 0
line. The lengths should now match.
This was recently merged into main. I have also created another release 2.0.1 which has this fix π
Great catch! π π
User checklist
Description
I was using the script
autometa-bedtools-genomecov
to calculate the coverage of contigs. But I found that the length of the contig in the output does not consistent with the one I calculated from the fasta. For example, I have a contig which is 516081 bp in length (calculated from fasta directly):But the output of
autometa-bedtools-genomecov
shows that it's only 511337 bp in length:I then checked the code and found that it seems like those uncovered regions did not count in
bases
in the coverage output: https://github.com/KwanLab/Autometa/blob/13591f069caf1d6a3c1f6dd91bfbadb3b8d356d0/autometa/common/external/bedtools.py#L105-L108 And from the bed I found there is 4744 bp covered 0 times516081-4744=511337. Exactly the number showed in the
autometa-bedtools-genomecov
output.I just wonder is that is intended to ignore those uncovered regions? Since I'm also try on the Metabat2 pipeline and the script they use to calculate the contig coverage (jgi_summarize_bam_contig_depths) did include those uncovered regions.
And I also want to know why to ignore those regions and what the difference between both calculation methods.
Expected Behavior
The contig length should be consistent between the output from
autometa-bedtools-genomecov
and fasta.