Closed jgolob closed 3 years ago
The only other pending concern I have is that I think we might need to remove any task.memory
or task.cpus
throughout the entire codebase. Do you agree, @jgolob or am I off base on that?
I think we can leave task.memory
and task.cpus
in there. It's in fact a nice feature that makes sure that the software uses the intended resources, not the max available.
Let's walk through the intended use-case which is supported by this change.
A user wants to process their data using a modest amount of resources, and so they run the workflow with an initial config that requests a modest amount (e.g. 30GB RAM for mem_veryhigh
). This works pretty well, but there are 1-2 samples which fail in the diamond
step with that amount. Since we are now using finish
, the workflow comes to a graceful halt, which is wonderful. Now the user gets to increase the memory (e.g. 60GB for mem_veryhigh
) and they re-run with the -resume
flag on.
My concern is this: if we keep task.memory
in the script
block, then all of the diamond
tasks will be re-run using more memory. If we remove task.memory
, then only the 1-2 failed samples will be rerun, which seems more like the intended behavior.
Thoughts, @jgolob ?
Let's walk through the intended use-case which is supported by this change.
A user wants to process their data using a modest amount of resources, and so they run the workflow with an initial config that requests a modest amount (e.g. 30GB RAM for
mem_veryhigh
). This works pretty well, but there are 1-2 samples which fail in thediamond
step with that amount. Since we are now usingfinish
, the workflow comes to a graceful halt, which is wonderful. Now the user gets to increase the memory (e.g. 60GB formem_veryhigh
) and they re-run with the-resume
flag on.My concern is this: if we keep
task.memory
in thescript
block, then all of thediamond
tasks will be re-run using more memory. If we removetask.memory
, then only the 1-2 failed samples will be rerun, which seems more like the intended behavior.Thoughts, @jgolob ?
I get the concern now, but I think that is not the behavior of nextflow, interestingly. The task.xxx
variables are handled differently than others.
Can we confirm this behavior with some manual testing, perhaps? Or is there documentation to give us confidence that this is a supported feature of Nextflow? I'm also happy to ping Paolo if that isn't easy to figure out.
I've confirmed with my local tests that the -resume
functionality works exactly as you said it would, @jgolob . I love being wrong about things like that!
Switches from 'retry' to 'finish' as an error recovery strategy. Also moved the
nextflow.config
tonextflow.config.sample