deploy-runner-on-gcp - does the cloud region matter in relation to overall region?

gregoryfoster commented 12 months ago

Describe the Bug

In the deploy-runner-on-gcp job called from event-gather-pipeline.yml, the cml runner option for cloud-region is hardcoded to us-central1-f. In my (very brief!) experience, this resulted in failures when attempting to create the machine due to ZONE_RESOURCE_POOL_EXHAUSTED - which may be transient, but I saw it repeatedly enough to try a different cloud region that supports T4 GPUs.

As well, I specified a region of us-west1 for my GCP project as a whole, different from the default us-central1 region in CDP. That distinction---and the fact that us-west1-b cloud region worked for me---made me wonder whether this is a setting which needs to track the overall GCP region to ensure access to associated resources. I don't know enough about any of this to know whether that's true or if this machine is standalone.

Expected Behavior

I expected the Event Gather action deploy-runner-on-gcp job to complete sucessfully.

Reproduction

Stand up a CDP instance situated in a region other than us-central1 and execute the Event Gather action.

Environment

Any additional information about your environment.

OS Version: [e.g. macOS 11.3.1]
Cookiecutter Version: [e.g. 0.5.0]

evamaxfield commented 11 months ago

Ya this is an interesting one. You are the second person to report that us-central might be overloaded now. In general, I tested a bunch of different regions for GCP compute way back when we added that process and found the us-central was generally available but sometimes was overloaded, but not nearly as much as all the other regions I tested. If you want to change the region feel free.

To my knowledge, there is no downside / drawback of using a different region for compute vs the region for the project. The only "big difference" is maybe data download + upload from storage which may cost a fraction more but in comparison to "stability of compute" I went with central at the time.

All of this is to say... do want you would like? And maybe we should document this somewhere?

gregoryfoster commented 11 months ago

Based on your feedback, I suggest we change this issue to a feature request to make cloud-region a template variable that can be edited on project generation.

evamaxfield commented 11 months ago

Seems fair to me!

evamaxfield commented 11 months ago

I have switched to us-west1-b for now as I am also running into a lot of issues.

evamaxfield commented 11 months ago

woops. reopening as I think we still want this to be parametrizable

dvdokkum commented 7 months ago

I'm also running into this issue on a new cookie cutter install... event gather runs are failing when trying to set up the runner: us-west1-b does not have enough resources available to fulfill the request. The instance is set up on the default central1 gcp region. Is there a workaround to get this working? ~It isn't clear to me how I would specify a different region...~

If helpful, I haven't customized anything... I just followed the directions in the youtube tutorial using all default cookie cutter values.

Update: I ended up just changing the specified region in the GH workflow back to us-central1-f and it worked!

evamaxfield commented 7 months ago

Ah yea sorry. All of the region stuff is entirely parameterizable. Whichever works best for you is great!

CouncilDataProject / cookiecutter-cdp-deployment