Here's the incremental plan I have in mind (open to change). Meanwhile, we can use the current code (albeit inconsistent between wcEcoli and the Gaia Python client) unless/until there are other clients besides wcEcoli.
I'll add a requested-worker-count property to https://github.com/CovertLab/wcEcoli/pull/755 . (The workflow builder's client and user are in a good position to decide how many workers to allocate.)
We add Gaia code to be able to launch workers via the GCE API. This will be better than pushing so hard on shell scripts. It'll need some parameters sent from the client that are currently in wcEcoli's runscripts/cloud/launch-workers.sh, some parameters that Gaia can get from gcloud (I further configured it on gaia-prime), and some added to its config file. This is easy.
As an interim step, maybe add a Gaia endpoint to launch workers, change the Gaia python client to use it, call that from the workflow builder, and dump both shell scripts. Or skip this step.
Make the Gaia server in charge of when to launch workers, which is whenever it starts or resumes running a workflow. With the requested-worker-count it doesn't have to decide how many.
The main advantage of this step is resuming a workflow without the user having to know to launch workers.
This step might require changing the way workers shut down or making Gaia monitor them because the timeouts won't fit every situation, e.g. if most workers time out while waiting for one worker to finish a long task, the workflow might need more workers afterwards.
Another day we make the Gaia server able to decide how many workers to launch, perhaps with more hints from the workflow.
Planning on this, was going to do it next I think.
Might be nice to have a Gaia endpoint for launching workers in general? However it is implemented, the client wouldn't have to care really. But that sounds better than having multiple repos responsible for launching workers.
Agreed, the more burden we can take off the end user the better. Ultimately having Gaia do this step is the best outcome.
Yes, leading into having workers with different resource requirements eventually.
Here's the incremental plan I have in mind (open to change). Meanwhile, we can use the current code (albeit inconsistent between wcEcoli and the Gaia Python client) unless/until there are other clients besides wcEcoli.
requested-worker-count
property to https://github.com/CovertLab/wcEcoli/pull/755 . (The workflow builder's client and user are in a good position to decide how many workers to allocate.)runscripts/cloud/launch-workers.sh
, some parameters that Gaia can get from gcloud (I further configured it on gaia-prime), and some added to its config file. This is easy.requested-worker-count
it doesn't have to decide how many.