Engine should create output writers/streams on behalf of tools that require them

broadinstitute / gatk

Official code repository for GATK versions 4 and up

https://software.broadinstitute.org/gatk

Other

1.65k stars 581 forks source link

Engine should create output writers/streams on behalf of tools that require them #121

Open droazen opened 9 years ago

akiezun commented 9 years ago

yes, please

akiezun commented 9 years ago

to clarify: the requirement is to implement a new annotation for outputs, mark output fields with that annotation, then add code to open the writers to the engine (probably in GATKTool) and close the writers on exit (regardless of fail/success).

lbergelson commented 9 years ago

I'm not sure how much of this we want to tackle before dataflowing. We are probably going to want to provide a standardized output format that will map nicely to Dataflow. Currently this means a single output pcollection, although some tools obviously will need multiple output streams. On Apr 19, 2015 11:30 PM, "Adam Kiezun" notifications@github.com wrote:

to clarify: the requirement is to implement a new annotation for outputs, mark output fields with that annotation, then add code to open the writers to the engine (probably in GATKTool) and close the writers on exit (regardless of fail/success).

— Reply to this email directly or view it on GitHub https://github.com/broadinstitute/hellbender/issues/121#issuecomment-94347589 .

droazen commented 9 years ago

Yes, we shouldn't assume this will necessarily be done any particular way (eg., through annotations) unless we have more working examples of dataflow tools from which to generalize.

akiezun commented 9 years ago

we need this independently of dataflow (needs to work one file walkers as well as dataflow). ie it has nothing to do with dataflow

akiezun commented 8 years ago

alpha candidate?