Open droazen opened 9 years ago
to clarify: the requirement is to implement a new annotation for outputs, mark output fields with that annotation, then add code to open the writers to the engine (probably in GATKTool) and close the writers on exit (regardless of fail/success).
I'm not sure how much of this we want to tackle before dataflowing. We are probably going to want to provide a standardized output format that will map nicely to Dataflow. Currently this means a single output pcollection, although some tools obviously will need multiple output streams. On Apr 19, 2015 11:30 PM, "Adam Kiezun" notifications@github.com wrote:
to clarify: the requirement is to implement a new annotation for outputs, mark output fields with that annotation, then add code to open the writers to the engine (probably in GATKTool) and close the writers on exit (regardless of fail/success).
— Reply to this email directly or view it on GitHub https://github.com/broadinstitute/hellbender/issues/121#issuecomment-94347589 .
Yes, we shouldn't assume this will necessarily be done any particular way (eg., through annotations) unless we have more working examples of dataflow tools from which to generalize.
we need this independently of dataflow (needs to work one file walkers as well as dataflow). ie it has nothing to do with dataflow
alpha candidate?
yes!
This applies to spark too.
In https://github.com/broadinstitute/gatk/pull/1146 we've at least centralized SAM/BAM writer creation methods in the superclasses -- this is good enough for alpha. In beta we can add the fancy auto-management of outputs we are dreaming about.
There is a branch/commit here with changes to centralize creation of the output VCFWriter for all (non-Spark) tools in order to centralize index creation, which is an incremental step in this direction.
@cmnbroad If you have any ideas about how this should be done, feel free to suggest an approach
What happened with that branch?
It is blocked by another branch currently under review, I believe.
yes, please