broadinstitute / picard

A set of command line tools (in Java) for manipulating high-throughput sequencing (HTS) data and formats such as SAM/BAM/CRAM and VCF.
https://broadinstitute.github.io/picard/
MIT License
967 stars 370 forks source link

gc content calculation off-by-one bug #1278

Open yangruialex opened 5 years ago

yangruialex commented 5 years ago

https://github.com/broadinstitute/picard/blob/b02e42e0ca82efae70ee572766cbe101b6a2305e/src/main/java/picard/analysis/GcBiasUtils.java#L97

When I was working with Picard.CollectGcBiasMetrics I noticed that when computing window counts for different gc content, the first and last window of the reference contig are not counted. Is this behavior expected?

yfarjoun commented 5 years ago

I think that you are right that there's a problem. but it isn't exactly what you are seeing. The reason 0 is skipped is that reference positions uses a 1-based coordinate system. But since the arrays in question do not, there seems to be a off-by-one bug here. Would appreciate your fix if you're up for it. but if not, someone will get around to it.

On Mon, Feb 11, 2019 at 8:51 PM Rui Yang notifications@github.com wrote:

https://github.com/broadinstitute/picard/blob/b02e42e0ca82efae70ee572766cbe101b6a2305e/src/main/java/picard/analysis/GcBiasUtils.java#L97

When I was working with Picard.CollectGcBiasMetrics I noticed that when computing window counts for different gc content, the first and last window of the reference contig are not counted. Is this behavior expected?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/broadinstitute/picard/issues/1278, or mute the thread https://github.com/notifications/unsubscribe-auth/ACnk0gDPy10t1mOmNfF4l0X9xHBP0ifeks5vMh4SgaJpZM4a1bjm .

yangruialex commented 5 years ago

Thanks for the clarification. Does that mean all the reads are assigned to the window 1 base downstream? Would this off-by-one error impact other processes that iterate through the ref?

I would love to fix it but unfortunately I don't know any java plus I am not familiar with picard code base. It'll be great if you can link the PR here when it's made. Thanks!