Finish - Githubissues

jgolob commented 3 years ago

Switches from 'retry' to 'finish' as an error recovery strategy. Also moved the nextflow.config to nextflow.config.sample

sminot commented 3 years ago

The only other pending concern I have is that I think we might need to remove any task.memory or task.cpus throughout the entire codebase. Do you agree, @jgolob or am I off base on that?

jgolob commented 3 years ago

I think we can leave task.memory and task.cpus in there. It's in fact a nice feature that makes sure that the software uses the intended resources, not the max available.

sminot commented 3 years ago

Let's walk through the intended use-case which is supported by this change.

A user wants to process their data using a modest amount of resources, and so they run the workflow with an initial config that requests a modest amount (e.g. 30GB RAM for mem_veryhigh). This works pretty well, but there are 1-2 samples which fail in the diamond step with that amount. Since we are now using finish, the workflow comes to a graceful halt, which is wonderful. Now the user gets to increase the memory (e.g. 60GB for mem_veryhigh) and they re-run with the -resume flag on.

My concern is this: if we keep task.memory in the script block, then all of the diamond tasks will be re-run using more memory. If we remove task.memory, then only the 1-2 failed samples will be rerun, which seems more like the intended behavior.

Thoughts, @jgolob ?

jgolob commented 3 years ago

Let's walk through the intended use-case which is supported by this change.

A user wants to process their data using a modest amount of resources, and so they run the workflow with an initial config that requests a modest amount (e.g. 30GB RAM for mem_veryhigh). This works pretty well, but there are 1-2 samples which fail in the diamond step with that amount. Since we are now using finish, the workflow comes to a graceful halt, which is wonderful. Now the user gets to increase the memory (e.g. 60GB for mem_veryhigh) and they re-run with the -resume flag on.

My concern is this: if we keep task.memory in the script block, then all of the diamond tasks will be re-run using more memory. If we remove task.memory, then only the 1-2 failed samples will be rerun, which seems more like the intended behavior.

Thoughts, @jgolob ?

I get the concern now, but I think that is not the behavior of nextflow, interestingly. The task.xxx variables are handled differently than others.

sminot commented 3 years ago

Can we confirm this behavior with some manual testing, perhaps? Or is there documentation to give us confidence that this is a supported feature of Nextflow? I'm also happy to ping Paolo if that isn't easy to figure out.

sminot commented 3 years ago

I've confirmed with my local tests that the -resume functionality works exactly as you said it would, @jgolob . I love being wrong about things like that!

Golob-Minot / geneshot

Finish #57