Open gregoryfoster opened 11 months ago
Ya this is an interesting one. You are the second person to report that us-central
might be overloaded now. In general, I tested a bunch of different regions for GCP compute way back when we added that process and found the us-central
was generally available but sometimes was overloaded, but not nearly as much as all the other regions I tested. If you want to change the region feel free.
To my knowledge, there is no downside / drawback of using a different region for compute vs the region for the project. The only "big difference" is maybe data download + upload from storage which may cost a fraction more but in comparison to "stability of compute" I went with central at the time.
All of this is to say... do want you would like? And maybe we should document this somewhere?
Based on your feedback, I suggest we change this issue to a feature request to make cloud-region
a template variable that can be edited on project generation.
Seems fair to me!
I have switched to us-west1-b for now as I am also running into a lot of issues.
woops. reopening as I think we still want this to be parametrizable
I'm also running into this issue on a new cookie cutter install... event gather runs are failing when trying to set up the runner: us-west1-b does not have enough resources available to fulfill the request
. The instance is set up on the default central1 gcp region. Is there a workaround to get this working? ~It isn't clear to me how I would specify a different region...~
If helpful, I haven't customized anything... I just followed the directions in the youtube tutorial using all default cookie cutter values.
Update:
I ended up just changing the specified region in the GH workflow back to us-central1-f
and it worked!
Ah yea sorry. All of the region stuff is entirely parameterizable. Whichever works best for you is great!
Describe the Bug
In the
deploy-runner-on-gcp
job called fromevent-gather-pipeline.yml
, thecml runner
option forcloud-region
is hardcoded tous-central1-f
. In my (very brief!) experience, this resulted in failures when attempting to create the machine due toZONE_RESOURCE_POOL_EXHAUSTED
- which may be transient, but I saw it repeatedly enough to try a different cloud region that supports T4 GPUs.As well, I specified a region of
us-west1
for my GCP project as a whole, different from the defaultus-central1
region in CDP. That distinction---and the fact thatus-west1-b
cloud region worked for me---made me wonder whether this is a setting which needs to track the overall GCP region to ensure access to associated resources. I don't know enough about any of this to know whether that's true or if this machine is standalone.Expected Behavior
I expected the Event Gather action
deploy-runner-on-gcp
job to complete sucessfully.Reproduction
Stand up a CDP instance situated in a region other than
us-central1
and execute the Event Gather action.Environment
Any additional information about your environment.