Closed aawdeh closed 1 week ago
Looks good to me! Thanks.
One other thing to mention is that, per Janet Gainer-Dewar, the use of preemptible machines on Azure might not make sense currently. It sounds like there is no support for "checkpointFile" on Azure yet.
So if you run with hardware_preemptible_tries
> 0, then you risk never actually finishing. CellBender will not pick up from its last checkpoint (since Terra on Azure doesn't support "checkpointFile"), and Janet said that they do not retry on a non-preemptible machine for the last run. So it might just fail if it continues to get preempted. I just worry about it a little because the run is bound to take a long time on CPU.
One other thing to mention is that, per Janet Gainer-Dewar, the use of preemptible machines on Azure might not make sense currently. It sounds like there is no support for "checkpointFile" on Azure yet.
So if you run with
hardware_preemptible_tries
> 0, then you risk never actually finishing. CellBender will not pick up from its last checkpoint (since Terra on Azure doesn't support "checkpointFile"), and Janet said that they do not retry on a non-preemptible machine for the last run. So it might just fail if it continues to get preempted. I just worry about it a little because the run is bound to take a long time on CPU.
That makes sense. I could set hardware_preemptible_tries
to 0 in the Azure run. What do you think? I can go ahead and make that change.
I was also wondering if it would make sense to also remove maxRetries
from the runtime variables?
Setting hardware_preemptible_tries
to 0 by default sounds like a good idea to be conservative. Yeah maxRetries
> 0 is usually used on GCP to overcome its PAPI error code 2 when it fails (inexplicably) to install GPU drivers. Retrying makes sense in that one limited case, since a retry can succeed. But most of the time, a retry does not make sense... if it was a cellbender failure, it will fail again on retry. But I guess it makes sense to leave that argument with a 0 default.
Looks good, I will merge
Thank you!
We don't have GPU access in ToA and we want to run Multiome (with CellBender) on Azure.
cellbender_remove_background_azure.wdl
is a copy ofcellbender_remove_background.wdl
without the CUDA options for CellBenderAzurized CellBender so that it runs in Multiome by (1) removing the
--cuda
option as a CellBender input parameter and (2) removing runtime variables for GPU access when on Azure.