NLeSC / noodles

Computational workflow engine, making distributed computing in Python easy!
http://nlesc.github.io/noodles
Apache License 2.0
21 stars 7 forks source link

Provide a clean-up action for job results. #40

Closed jhidding closed 6 years ago

jhidding commented 7 years ago

Some jobs may need clean-up after their results have been passed on. This ability would conflict with storing results in the cache, so the features store and clean should be mutually exclusive. If we can't have a decent solution for this, it should be solved in user-space.

Data from certain packages can be huge. Parsing all of it into handable format too expensive. So we need selective parsing based on the need of the user, then cache that result. After this is done, clean-up the mess of the stupid package. One problem is that we'd like to associate the users data-request with the input of the computation, meaning that the computations input should form its provenance, irrespective of the output. This situation is different from other cases, where the path towards a certain function call doesn't affect the outcome.

A proper solution to this needs some decent thinking.