broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.65k stars 581 forks source link

Engine should create output writers/streams on behalf of tools that require them #121

Open droazen opened 9 years ago

akiezun commented 9 years ago

yes, please

akiezun commented 9 years ago

to clarify: the requirement is to implement a new annotation for outputs, mark output fields with that annotation, then add code to open the writers to the engine (probably in GATKTool) and close the writers on exit (regardless of fail/success).

lbergelson commented 9 years ago

I'm not sure how much of this we want to tackle before dataflowing. We are probably going to want to provide a standardized output format that will map nicely to Dataflow. Currently this means a single output pcollection, although some tools obviously will need multiple output streams. On Apr 19, 2015 11:30 PM, "Adam Kiezun" notifications@github.com wrote:

to clarify: the requirement is to implement a new annotation for outputs, mark output fields with that annotation, then add code to open the writers to the engine (probably in GATKTool) and close the writers on exit (regardless of fail/success).

— Reply to this email directly or view it on GitHub https://github.com/broadinstitute/hellbender/issues/121#issuecomment-94347589 .

droazen commented 9 years ago

Yes, we shouldn't assume this will necessarily be done any particular way (eg., through annotations) unless we have more working examples of dataflow tools from which to generalize.

akiezun commented 9 years ago

we need this independently of dataflow (needs to work one file walkers as well as dataflow). ie it has nothing to do with dataflow

akiezun commented 8 years ago

alpha candidate?

droazen commented 8 years ago

yes!

droazen commented 8 years ago

This applies to spark too.

droazen commented 8 years ago

In https://github.com/broadinstitute/gatk/pull/1146 we've at least centralized SAM/BAM writer creation methods in the superclasses -- this is good enough for alpha. In beta we can add the fancy auto-management of outputs we are dreaming about.

cmnbroad commented 8 years ago

There is a branch/commit here with changes to centralize creation of the output VCFWriter for all (non-Spark) tools in order to centralize index creation, which is an incremental step in this direction.

droazen commented 8 years ago

@cmnbroad If you have any ideas about how this should be done, feel free to suggest an approach

lbergelson commented 8 years ago

What happened with that branch?

droazen commented 8 years ago

It is blocked by another branch currently under review, I believe.