Open xesdiny opened 1 year ago
I think the model fragment loading can be completed under the 6.7B parameter, why use parameterized offload to the cpu?
"offload_param": { "device": "cpu", "pin_memory": true },
We want to train with a larger batchsize
I think the model fragment loading can be completed under the 6.7B parameter, why use parameterized offload to the cpu?