broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.65k stars 582 forks source link

Will your workflow be ready for the Jan 9 release? #3769

Closed sooheelee closed 6 years ago

sooheelee commented 6 years ago

Hi folks. @chandrans and I have laid out some plans towards updating GATK4 docs for the January 9 release. Our approach is to prioritize documentation around stable Best Practice Workflows. On the docket currently is the single stable workflow--germline SNP and indel calling from DNA data. We will of course update tool docs (excluding Spark and BWA tools) and supporting tutorials. Even for tools we are unfamiliar with, we aim to have at the least a basic description and an example command.

Thanks for the documentation you have already done and the help you give us in updating these.

If you are certain your workflow will be ready for the release, then please let us know immediately so we can plan accordingly. If your workflow will be ready later, then can you still give us an estimate for your release so we can plan ahead? Thanks.

It would be most helpful to users if we also have validation of our workflows as applied to real data. Are there plans to make benchmarking stats available for each of your workflows?

Sheila and I have 30-man days we can give between us towards updating documentation by December 14. Besides Geraldine, we will rely on some of you to review further refinements to documentation from now to December 14. Thanks again.

davidbenjamin commented 6 years ago

@sooheelee Mutect has been ready for a while. We will make more big changes in the next few months but we will always maintain it in a production-ready state.

jonn-smith commented 6 years ago

@sooheelee A version of Funcotator will be ready for the Jan 9 release. I'm not yet 100% certain which functionalities will be included in it. Also, it will likely need to be marked as a beta tool.

sooheelee commented 6 years ago

@davidbenjamin For those workflows deemed ready, the forum should feature a (How to) tutorial to enable deeper understanding and an example usecase. Do you feel the current hands-on Mutect2 tutorial I wrote for the Helsinki workshop is adequate for a BETA (How to) tutorial (to match the BETA release of the workflow)? The plan then would be to transcribe this tutorial (with some tweaking) into a forum document before December 14. Otherwise, we must plan for additional tutorial development.

sooheelee commented 6 years ago

Thanks @jonn-smith. Is Funcotator ready for us to document now or in the next month? Meaning, is it usable by users now? Otherwise, we can release an alpha tutorial initially to get folks to use it.

sooheelee commented 6 years ago

@davidbenjamin and @mbabadi Just wanted to remind you here that any Mutect2/FilterMutectCalls and gCNV resource files we provide (those you have given me at one point) need to have READMEs to accompany them to trace provenance, similar to what I've outlined in #3768 for GRCh38 resource files.

What this does is enable folks NOT using human data to make their own resources.

jonn-smith commented 6 years ago

@sooheelee It isn't ready yet - I'm still debugging some of the core functionality.

davidbenjamin commented 6 years ago

@sooheelee Your Helsinki M2 tutorial is definitely enough.

And I can have resource READMEs ready by January 9.

sooheelee commented 6 years ago

@jonn-smith In your absence, you've been volunteered to draft the alpha (How to) tutorial for Functotator. I opened a new issue ticket for you regarding this work:

https://github.com/broadinstitute/gatk/issues/3774

samuelklee commented 6 years ago

@sooheelee I think we should be able to hit Jan 9 for what I've been calling the "ModelSegments" pipeline, in terms of getting the new code merged into master. It will be ready to go for WGS.

However, it's hard to say whether or not we'll have completed internal evaluations of this pipeline by then. These will be necessary to identify good default values for parameters that will affect sensitivity. @LeeTL1220 and @katevoss are helping out here. @MartonKN is also beginning work on an improved caller, which could potentially replace the current one before release.

As for gCNV, @asmirnov239 and I will be helping @mbabadi get the python version wrapped in Java. We should be able to get at least cohort-calling mode in by release. Case calling can come shortly after if we don't manage to get it in as well. Here, we are relying a bit more on external groups to run evaluations and provide feedback, but we will do what internal evaluations we can before release.

jonn-smith commented 6 years ago

@sooheelee That's reasonable. I saw the ticket. Sounds good.

cwhelan commented 6 years ago

@sooheelee We plan to have internal pilots running by then but haven't discussed releasing to the public in an beta or full release yet. We will have some unsupported WDL pipelines and light documentation available for external users, and we can provide benchmarking stats courtesy of our collaborators. It's probably safer to tag the tool as 'alpha' or 'beta' for the January release, though. We should be ready for full release sometime in the first half of next year, I expect.

sooheelee commented 6 years ago

Thanks for the update @cwhelan and thanks for prepping the initial documentation. If you are considering any (How to) tutorials, then perhaps you might find of interest some factors I've outlined in https://github.com/broadinstitute/gatk/issues/3774 (for John).

@mwalker174, I'm sorry to have overlooked Path-Seq. Geraldine mentioned she is working on some docs for you. Will your workflow be ready by January 9?

ldgauthier commented 6 years ago

@sooheelee Where would I find the proposed documentation for the germline SNP & INDEL joint calling best practices? I have a sneaking suspicion that they're not up to date with what we actually do, esp. hard filtering on ExcessHet. And if it is up to date, then the InbreedingCoeff docs should be updated.

sooheelee commented 6 years ago

There are none as of yet @ldgauthier. What we do have are WDL-ized pipelines that we will follow at https://github.com/gatk-workflows.

sooheelee commented 6 years ago

I just spot-checked the gatk-workflows @scripts for "excesshet" and came up empty so the scripts I'm thinking of are elsewhere and I have asked for this location. In the meanwhile, can I leave updating the InbreedingCoeff docs to you @ldgauthier?

AMENDED: I found excesshet in gatk-workflows/broad-prod-wgs-germline-snps-indels.

sooheelee commented 6 years ago

GATK4.0 released Jan 9. We can close this ticket.