broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.7k stars 591 forks source link

[feature request and question] downsample to coverage #5075

Open hsiaoyi0504 opened 6 years ago

hsiaoyi0504 commented 6 years ago

Feature request and Question

Original question was posted in GATK forum https://gatkforums.broadinstitute.org/gatk/discussion/12026/how-to-do-downsampling, but it seems to me that the question should be posted here to ask the developer team

Tool(s) or class(es) involved

PrintReads

Description

In GATK4, printReads doesn't have an option to do downsample to coverage anymore. Is there any reason for that ? Or is there any update suggestions to do the same thing but migrating it from GATK3 to GATK4 ? The forum maintainer told me in original discussion that there is a DownsampleSam function in picard, but it can't be used to downsample to coverage directly.

hsiaoyi0504 commented 6 years ago

Just make a mistake that submit an issue without filling in the detail. I have edited and re-opened it again.

hsiaoyi0504 commented 6 years ago

Is there any update of this ?

davidbenjamin commented 5 years ago

@droazen The issue here is that ReadWalker doesn't have downsampling, right? It seems like it would be straightforward to downsample the streamed reads in its traverse method. Thoughts / am I missing something?

droazen commented 5 years ago

@davidbenjamin We could easily add in ReadWalker downsampling, yes -- it would be simple to add alignment-start-based downsampling like GATK3 ReadWalkers had (and the GATK4 HaplotypeCaller currently has) using a ReadsDownsamplingIterator + a PositionalDownsampler.

davidbenjamin commented 5 years ago

This one is for the engine team to decide on.