DataBiosphere / dsub

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.
Apache License 2.0
265 stars 44 forks source link

Add taskCount and Parallelism options when using dsub with Google Cloud Batch #256

Open croninjoseph opened 1 year ago

croninjoseph commented 1 year ago

Google Cloud Batch has the option to set a taskCount and parallelism.
If for example, taskCount was set to 10 and parallelism was set to 2, Batch would run 10 tasks on 2 VMs. This feature would be helpful if available through dsub.

wnojopra commented 1 year ago

The next dsub update is focusing on bringing the google-batch provider closer to feature parity with the other providers. Adding support for setting taskCount and parallelism should be a fairly reasonable feature to support on top of that.

There are a few examples of setting both in the Google Cloud Batch API in their docs. It should be pretty straightforward to add --taskCount and --parallelism options to dsub only for google-batch provider. The only thing to be careful of is that it doesn't interfere with the current --tasks parameter. Technically it should just work, but we'll need to document carefully parallelism across multiple VMs vs parallelism on one VM.