GoogleCloudPlatform / pubsec-declarative-toolkit

The GCP PubSec Declarative Toolkit is a collection of declarative solutions to help you on your Journey to Google Cloud. Solutions are designed using Config Connector and deployed using Config Controller.
Apache License 2.0
31 stars 28 forks source link

GCE support for G2 g2-standard-48 VM as wrapper of four L4 GPUs (CUDA CC 8.9 80G vram) via the LZ as raw headless GCE in addition to marketplace nvidia-rtx-virtual-workstation #655

Open obriensystems opened 1 year ago

obriensystems commented 1 year ago

add example workload config for G2 in general - for cuda/tensorflow/keras/llm training/inference https://cloud.google.com/blog/products/compute/introducing-g2-vms-with-nvidia-l4-gpus

Alternate NVidia workstation deployment is already working via marketplace https://console.cloud.google.com/marketplace/product/nvidia/nvidia-rtx-virtual-workstation-windows-server-2022

L4 GPUs per G2 VM

| NAME             | DIMENSIONS | REGION | REQUESTED LIMIT | APPROVED LIMIT |
+------------------+------------+--------+-----------------+----------------+
| CPUS_ALL_REGIONS |            | GLOBAL |              96 |             96 |
gcloud compute instances create l4-4b --project=cuda-old --zone=us-east4-c --machine-type=g2-standard-48 --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default --maintenance-policy=TERMINATE --provisioning-model=STANDARD --service-account=196717963363-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/cloud-platform --accelerator=count=4,type=nvidia-l4 --tags=http-server,https-server --create-disk=auto-delete=yes,boot=yes,device-name=l4-4b,image=projects/ml-images/global/images/c0-deeplearning-common-cu121-v20231105-debian-11,mode=rw,size=50,type=projects/cuda-old/zones/us-east4-c/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=goog-ec-src=vm_add-gcloud --reservation-affinity=any

(base) michael@l4-4b:~$ nvidia-smi
Fri Dec  1 01:42:34 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.10              Driver Version: 535.86.10    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   58C    P0              29W /  72W |      4MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                      Off | 00000000:00:04.0 Off |                    0 |
| N/A   60C    P0              31W /  72W |      4MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA L4                      Off | 00000000:00:05.0 Off |                    0 |
| N/A   58C    P0              31W /  72W |      4MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA L4                      Off | 00000000:00:06.0 Off |                    0 |
| N/A   58C    P0              29W /  72W |      4MiB / 23034MiB |      4%      Default |
obriensystems commented 11 months ago

Dual L4 g2-standard-24 24/96G - running DL image

Created [https://www.googleapis.com/compute/v1/projects/cuda-old/zones/us-east4-c/instances/l4-4-2]. NAME: l4-4-2 ZONE: us-east4-c MACHINE_TYPE: g2-standard-24 PREEMPTIBLE: INTERNAL_IP: 10.150.0.10 EXTERNAL_IP: 34. STATUS: RUNNING


ssh

====================================== Welcome to the Google Deep Learning VM

Version: common-gpu.m113 Resources:

To reinstall Nvidia driver (if needed) run: sudo /opt/deeplearning/install-driver.sh Linux l4-4-2 5.10.0-26-cloud-amd64 #1 SMP Debian 5.10.197-1 (2023-09-29) x86_64

The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright.

Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law.

This VM requires Nvidia drivers to function correctly. Installation takes ~1 minute. Would you like to install the Nvidia driver? [y/n]

Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 525.105.17...... WARNING: The nvidia-drm module will not be installed. As a result, DRM-KMS will not function with this installation of the NVIDIA driver.


oik

running a python ve (base) michael@l4-4-2:~$ nvidia-smi Thu Nov 30 19:51:56 2023
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA L4 Off | 00000000:00:03.0 Off | 0 | | N/A 60C P0 32W / 72W | 0MiB / 23034MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA L4 Off | 00000000:00:04.0 Off | 0 | | N/A 57C P0 31W / 72W | 0MiB / 23034MiB | 7% Default | | | | N/A |



<img width="894" alt="Screenshot 2023-11-30 at 15 00 28" src="https://github.com/GoogleCloudPlatform/pubsec-declarative-toolkit/assets/24765473/dacbd70e-f94a-43f3-a00d-c35505636399">
obriensystems commented 11 months ago

TensorFlow / Keras ML training run

Run a standard concurrent saturation TensorFlow/Keras ML job from U of Toronto to check batch size optimums under 30 epochs to get close to 1.0 fitness - 25 avoids overfit

https://github.com/ObrienlabsDev/machine-learning

base) michael@l4-4-2:~$ git clone https://github.com/ObrienlabsDev/machine-learning.git
(base) michael@l4-4-2:~/machine-learning$ vi environments/windows/src/tflow.py 
import tensorflow as tf
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])
cifar = tf.keras.datasets.cifar100
(x_train, y_train), (x_test, y_test) = cifar.load_data()

with strategy.scope():
# https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/ResNet50
# https://keras.io/api/models/model/
  parallel_model = tf.keras.applications.ResNet50(
    include_top=True,
    weights=None,
    input_shape=(32, 32, 3),
    classes=100,)
  loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False)
# https://keras.io/api/models/model_training_apis/
  parallel_model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
parallel_model.fit(x_train, y_train, epochs=30, batch_size=2048)#5120)#7168)#7168)

(base) michael@l4-4-2:~/machine-learning$ cat environments/windows/Dockerfile 
FROM tensorflow/tensorflow:latest-gpu
WORKDIR /src
COPY /src/tflow.py .
CMD ["python", "tflow.py"]

base) michael@l4-4-2:~/machine-learning$ ./build.sh 
Sending build context to Docker daemon  6.656kB
Step 1/4 : FROM tensorflow/tensorflow:latest-gpu
latest-gpu: Pulling from tensorflow/tensorflow

uccessfully tagged ml-tensorflow-win:latest
2023-11-30 20:29:26.443809: I tensorflow/core/util/port.cc:113] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2023-11-30 20:29:26.497571: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-11-30 20:29:26.497614: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-11-30 20:29:26.499104: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-11-30 20:29:26.506731: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-11-30 20:29:31.435829: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20795 MB memory:  -> device: 0, name: NVIDIA L4, pci bus id: 0000:00:03.0, compute capability: 8.9
2023-11-30 20:29:31.437782: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1929] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 20795 MB memory:  -> device: 1, name: NVIDIA L4, pci bus id: 0000:00:04.0, compute capability: 8.9
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz
169001437/169001437 [==============================] - 3s 0us/step
Epoch 1/30

023-11-30 20:30:19.985861: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906
2023-11-30 20:30:20.001134: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906
2023-11-30 20:30:29.957119: I external/local_xla/xla/service/service.cc:168] XLA service 0x7f9c6bf3a4f0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-11-30 20:30:29.957184: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (0): NVIDIA L4, Compute Capability 8.9
2023-11-30 20:30:29.957192: I external/local_xla/xla/service/service.cc:176]   StreamExecutor device (1): NVIDIA L4, Compute Capability 8.9
2023-11-30 20:30:29.965061: I tensorflow/compiler/mlir/tensorflow/utils/dump_mlir_util.cc:269] disabling MLIR crash reproducer, set env var `MLIR_CRASH_REPRODUCER_DIRECTORY` to enable.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1701376230.063893      80 device_compiler.h:186] Compiled cluster using XLA!  This line is logged at most once for the lifetime of the process.

25/25 [==============================] - 71s 317ms/step - loss: 4.9465 - accuracy: 0.0418
Epoch 2/30
25/25 [==============================] - 4s 142ms/step - loss: 3.8430 - accuracy: 0.1214
Epoch 3/30
25/25 [==============================] - 4s 142ms/step - loss: 3.3694 - accuracy: 0.1967
Epoch 4/30
25/25 [==============================] - 4s 143ms/step - loss: 3.0832 - accuracy: 0.2544
Epoch 5/30
25/25 [==============================] - 4s 143ms/step - loss: 2.7049 - accuracy: 0.3326
Epoch 6/30
25/25 [==============================] - 4s 143ms/step - loss: 2.3329 - accuracy: 0.4119
Epoch 7/30
25/25 [==============================] - 4s 143ms/step - loss: 1.9781 - accuracy: 0.4824
Epoch 8/30
25/25 [==============================] - 4s 143ms/step - loss: 1.9177 - accuracy: 0.4948
Epoch 9/30
25/25 [==============================] - 4s 142ms/step - loss: 1.4980 - accuracy: 0.5937
Epoch 10/30
25/25 [==============================] - 4s 144ms/step - loss: 1.3247 - accuracy: 0.6322
Epoch 11/30
25/25 [==============================] - 4s 142ms/step - loss: 1.0408 - accuracy: 0.7063
Epoch 12/30
25/25 [==============================] - 4s 142ms/step - loss: 0.9150 - accuracy: 0.7439
Epoch 13/30
25/25 [==============================] - 4s 143ms/step - loss: 0.8210 - accuracy: 0.7648
Epoch 14/30
25/25 [==============================] - 4s 142ms/step - loss: 0.5581 - accuracy: 0.8424
Epoch 15/30
25/25 [==============================] - 4s 141ms/step - loss: 0.4635 - accuracy: 0.8709
Epoch 16/30
25/25 [==============================] - 4s 142ms/step - loss: 0.4771 - accuracy: 0.8610
Epoch 17/30
25/25 [==============================] - 4s 142ms/step - loss: 0.9404 - accuracy: 0.7228
Epoch 18/30
25/25 [==============================] - 4s 143ms/step - loss: 0.5478 - accuracy: 0.8385
Epoch 19/30
25/25 [==============================] - 4s 143ms/step - loss: 0.4107 - accuracy: 0.8867
Epoch 20/30
25/25 [==============================] - 4s 143ms/step - loss: 0.2424 - accuracy: 0.9345
Epoch 21/30
25/25 [==============================] - 4s 146ms/step - loss: 0.1677 - accuracy: 0.9587
Epoch 22/30
25/25 [==============================] - 4s 142ms/step - loss: 0.1419 - accuracy: 0.9659
Epoch 23/30
25/25 [==============================] - 4s 141ms/step - loss: 0.1861 - accuracy: 0.9510
Epoch 24/30
25/25 [==============================] - 4s 141ms/step - loss: 0.2771 - accuracy: 0.9264
Epoch 25/30
25/25 [==============================] - 4s 142ms/step - loss: 0.2663 - accuracy: 0.9326
Epoch 26/30
25/25 [==============================] - 4s 141ms/step - loss: 0.1710 - accuracy: 0.9600
Epoch 27/30
25/25 [==============================] - 4s 141ms/step - loss: 0.4977 - accuracy: 0.8626
Epoch 28/30
25/25 [==============================] - 4s 141ms/step - loss: 0.6559 - accuracy: 0.8100
Epoch 29/30
25/25 [==============================] - 4s 143ms/step - loss: 0.3074 - accuracy: 0.9105
Epoch 30/30
25/25 [==============================] - 4s 143ms/step - loss: 0.1834 - accuracy: 0.9515
(base) michael@l4-4-2:~/machine-learning$ 
Screenshot 2023-11-30 at 15 31 16
Batch = 2048, epochs = 25
Epoch 24/25
25/25 [==============================] - 4s 144ms/step - loss: 0.2537 - accuracy: 0.9221
Epoch 25/25
25/25 [==============================] - 4s 145ms/step - loss: 0.2258 - accuracy: 0.9300
Screenshot 2023-11-30 at 16 26 55

On a2-standard-48 4 L4s

2 of 4
strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1"])#, "/gpu:2", "/gpu:3"])
parallel_model.fit(x_train, y_train, epochs=25, batch_size=2048)

|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   78C    P0              66W /  72W |  21070MiB / 23034MiB |     82%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                      Off | 00000000:00:04.0 Off |                    0 |
| N/A   77C    P0              69W /  72W |  21070MiB / 23034MiB |     78%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA L4                      Off | 00000000:00:05.0 Off |                    0 |
| N/A   64C    P0              33W /  72W |    196MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA L4                      Off | 00000000:00:06.0 Off |                    0 |
| N/A   64C    P0              31W /  72W |    196MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     15778      C   python                                    21058MiB |
|    1   N/A  N/A     15778      C   python                                    21058MiB |
|    2   N/A  N/A     15778      C   python                                      184MiB |
|    3   N/A  N/A     15778      C   python                                      184MiB |
+---------------------------------------------------------------------------------------+
obriensystems commented 11 months ago

4 L4s on a2-standard-48 aggregated 80G (same as V100, A100, H100 - but a lot lower bus width

More than 2 GPU's - same issue as https://github.com/tensorflow/tensorflow/issues/41724

|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   71C    P0              32W /  72W |  20958MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                      Off | 00000000:00:04.0 Off |                    0 |
| N/A   71C    P0              35W /  72W |  20956MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA L4                      Off | 00000000:00:05.0 Off |                    0 |
| N/A   66C    P0              34W /  72W |  20956MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA L4                      Off | 00000000:00:06.0 Off |                    0 |
| N/A   65C    P0              31W /  72W |  20956MiB / 23034MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     37338      C   python                                    20946MiB |
|    1   N/A  N/A     37338      C   python                                    20944MiB |
|    2   N/A  N/A     37338      C   python                                    20944MiB |
|    3   N/A  N/A     37338      C   python                                    20944MiB |
+---------------------------------------------------------------------------------------+

Epoch 1/25
2023-12-01 01:56:26.358086: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906
2023-12-01 01:56:26.370835: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906
2023-12-01 01:56:26.389974: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906
2023-12-01 01:56:26.407626: I external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:454] Loaded cuDNN version 8906

strategy = tf.distribute.MirroredStrategy(devices=["/gpu:0", "/gpu:1", "/gpu:2", "/gpu:3"])
or
strategy = tf.distribute.MirroredStrategy()#devices=["/gpu:0", "/gpu:1"])#, "/gpu:2", "/gpu:3"])

Issues with more than 2 GPUs both on GCP and using an on prem 3 GPU setup - two RTX-4500s and one RTX-4000
Working fine with 2 GPUs

Switch Strategy - to cross_device_ops - working for more than 2 GPUs

On 4 L4s or 3 RTX-4500/4500/4000

https://github.com/tensorflow/tensorflow/issues/41724#issuecomment-665996179

strategy = tf.distribute.MirroredStrategy(cross_device_ops=tf.distribute.ReductionToOneDevice())
parallel_model.fit(x_train, y_train, epochs=25, batch_size=2048)
image
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA L4                      Off | 00000000:00:03.0 Off |                    0 |
| N/A   80C    P0              62W /  72W |  21002MiB / 23034MiB |     58%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA L4                      Off | 00000000:00:04.0 Off |                    0 |
| N/A   78C    P0              67W /  72W |  20994MiB / 23034MiB |     46%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA L4                      Off | 00000000:00:05.0 Off |                    0 |
| N/A   76C    P0              67W /  72W |  20998MiB / 23034MiB |     55%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA L4                      Off | 00000000:00:06.0 Off |                    0 |
| N/A   75C    P0              51W /  72W |  21002MiB / 23034MiB |     55%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     40306      C   python                                    20990MiB |
|    1   N/A  N/A     40306      C   python                                    20982MiB |
|    2   N/A  N/A     40306      C   python                                    20986MiB |
|    3   N/A  N/A     40306      C   python                                    20990MiB |
+---------------------------------------------------------------------------------------+

Epoch 24/25
25/25 [==============================] - 3s 105ms/step - loss: 0.2089 - accuracy: 0.9445
Epoch 25/25
25/25 [==============================] - 3s 105ms/step - loss: 0.1559 - accuracy: 0.9592
obriensystems commented 11 months ago

Eight L4s = 160G VRAM - a2-standard-96, 5k/month or $12.0/h

gcloud compute instances create l4-8c --project=cuda-old --zone=us-east4-c --machine-type=g2-standard-96 --network-interface=network-tier=PREMIUM,stack-type=IPV4_ONLY,subnet=default --maintenance-policy=TERMINATE --provisioning-model=STANDARD --service-account=196717963363-compute@developer.gserviceaccount.com --scopes=https://www.googleapis.com/auth/cloud-platform --accelerator=count=8,type=nvidia-l4 --tags=http-server,https-server --create-disk=auto-delete=yes,boot=yes,device-name=l4-8c,image=projects/ml-images/global/images/c0-deeplearning-common-cu121-v20231105-debian-11,mode=rw,size=50,type=projects/cuda-old/zones/us-east4-c/diskTypes/pd-balanced --no-shielded-secure-boot --shielded-vtpm --shielded-integrity-monitoring --labels=goog-ec-src=vm_add-gcloud --reservation-affinity=any

 - Quota 'GPUS_ALL_REGIONS' exceeded.  Limit: 4.0 globally.
        metric name = compute.googleapis.com/gpus_all_regions
        limit name = GPUS-ALL-REGIONS-per-project
        limit = 4.0
        dimensions = global: global

Thank you for submitting Case # (ID:f...28d) to Google Cloud Platform support for the following quota:
Change GPUs (all regions) from 4 to 8