google-deepmind / xmanager

A platform for managing machine learning experiments
Apache License 2.0
816 stars 45 forks source link

ResourceExhausted: 429 The following quota metrics exceed quota limits #29

Open crystina-z opened 2 years ago

crystina-z commented 2 years ago

Hi! Thanks for building this amazing project. Recently I'm running script on xmanager+vertex.AI on TPU v2 and v3, but I keep getting this error:

google.api_core.exceptions.ResourceExhausted: 429 The following quota metrics exceed quota limits: aiplatform.googleapis.com/custom_model_training_tpu_v2

The error is thrown at this line - https://github.com/deepmind/xmanager/blob/v0.2.0/xmanager/cloud/vertex.py#L181.

Below are the sanity checks that I've done:

I've enabled the three APIs mentioned in the readme (IAM, Cloud AI Platform, Container Registry), additionally Vertex API and Cloud Resource Manager API was enabled. I also checked the Quota page on the console, which looks fine as well. Doesn't look like I'm overusing the resources as described in the error message "exceed quota limits".

It's been bugging me for quite a few days, and would be really appreciated if anyone could suggest what's possibly going on there. Thanks in advance!

crystina-z commented 2 years ago

Sorry forgot this - I'm using Python 3.9 and xmanager==0.2.0. lmk if any more info is needed from me

saramirabi commented 5 months ago

I have the same issue can anyone help?