Closed hekaijie123 closed 2 weeks ago
have you set offload the model‘s params and optimizer to cpu?
You need to offload model parameters and optimizer parameters to the CPU, further reducing GPU memory usage:
"zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu", "pin_memory": true }, "offload_param": { "device": "cpu", "pin_memory": true } }
Hello Author, I am using two A100 GPUs (40GB each) to run full-parameter fine-tuning. Regardless of whether I use the zero2 or zero3 configuration, it always shows that the memory is exceeded. However, according to the "Model Fine-tuning Memory Usage Statistics" table you provided below, GPUs2 uses 16GB of memory. How can this be resolved?