Hi! Thanks for building this amazing project. Recently I'm running script on xmanager+vertex.AI on TPU v2 and v3, but I keep getting this error:
google.api_core.exceptions.ResourceExhausted: 429 The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_tpu_v2
I've enabled the three APIs mentioned in the readme (IAM, Cloud AI Platform, Container Registry), additionally Vertex API and Cloud Resource Manager API was enabled. I also checked the Quota page on the console, which looks fine as well. Doesn't look like I'm overusing the resources as described in the error message "exceed quota limits".
It's been bugging me for quite a few days, and would be really appreciated if anyone could suggest what's possibly going on there. Thanks in advance!
Hi! Thanks for building this amazing project. Recently I'm running script on xmanager+vertex.AI on TPU v2 and v3, but I keep getting this error:
The error is thrown at this line - https://github.com/deepmind/xmanager/blob/v0.2.0/xmanager/cloud/vertex.py#L181.
Below are the sanity checks that I've done:
tensorboard
is set to empty string.self.location
,self.project
,pools
andauth.get_bucket()
all look good. where the location isus-central1
, andpools
showing --I've enabled the three APIs mentioned in the readme (
IAM
,Cloud AI Platform
,Container Registry
), additionallyVertex API
andCloud Resource Manager API
was enabled. I also checked the Quota page on the console, which looks fine as well. Doesn't look like I'm overusing the resources as described in the error message "exceed quota limits".It's been bugging me for quite a few days, and would be really appreciated if anyone could suggest what's possibly going on there. Thanks in advance!