Open josevalim opened 1 year ago
For CPU offloading during training: https://huggingface.co/docs/transformers/main_classes/deepspeed
Btw, we would implement this with infeed, but infeed is not supported in neither iree or cuda pjrt plugins at the moment. There may be a chance this needs to be implemented in a layer above (such as Axon).
For CPU offloading during training: https://huggingface.co/docs/transformers/main_classes/deepspeed