cfe-lab / Kive

Archival and automation of bioinformatic pipelines and data
https://cfe-lab.github.io/Kive
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Implement ability to discard intermediate data #413

Closed rhliang closed 9 years ago

rhliang commented 9 years ago

If we're going to batch jobs, we better be able to clean up the system alongside it! We're starting to feel the space crunch on the cluster right now.

We decided to simplify the strategy for this issue.

jamesnakagawa commented 9 years ago

Did a part of this today. For the first point, it's currently being passed as dataset names. Since the backend rejects these I created a new branch since functionality is temporarily broken.

ArtPoon commented 9 years ago

Since an interface for deleting data files may become extremely complicated (selecting data files associated with specific pipelines, versions), the most feasible approach may be to define global scope criteria such as the creation date of intermediate data files for all pipelines, or the number of times a data set has been accessed. Continue discussion offline for now.

donkirkby commented 9 years ago

After some design discussion, this is the strategy:

Deleting a dataset is causing some problems when we search for exec records. I'll try looking at how we handle PipelineStep.outputs_to_delete.

jamesnakagawa commented 9 years ago

Just refreshed myself on where I left the ImplementOutputsToDelete branch. I think I'm just waiting for the backend to accept the new form data. Is someone available to help bring that up to speed?

rhliang commented 9 years ago

Sure I can help with that.