I am currently trying to train NeuralGCM on some custom data, but the computational resource requirements are too high for my setup. The model (NeuralGCM-2.8deg) currently requires 16 TPUs, which is beyond my available resources.
I would like to know if there is a way to tweak the size of the model so that it can fit on a single A100 GPU.
You can definitely train NeuralGCM at smaller scale, it will just take longer. Our 2.8 degree fits on a single TPU/GPU, we only used the multiple TPUs for data parallelism.
I am currently trying to train NeuralGCM on some custom data, but the computational resource requirements are too high for my setup. The model (NeuralGCM-2.8deg) currently requires 16 TPUs, which is beyond my available resources. I would like to know if there is a way to tweak the size of the model so that it can fit on a single A100 GPU.