Closed kfertakis closed 3 weeks ago
This is a bit complicated, I'm not sure if DeepSpeed supports this, but I recommend using deepspeed's offload feature
Thanks. Deepspeed's offload feature is only offloading the optimiser on the CPU if I'm not mistaken. What I'm trying to achieve is to move both the model (with the parameters) and the optimiser from the GPU to the CPU and release the previously allocated GPU memory. Is there perhaps a proposed way to approach this? Thanks
Thanks. Deepspeed's offload feature is only offloading the optimiser on the CPU if I'm not mistaken. What I'm trying to achieve is to move both the model (with the parameters) and the optimiser from the GPU to the CPU and release the previously allocated GPU memory. Is there perhaps a proposed way to approach this? Thanks
deepspeed supports model weights offloading
Hi,
I'm running PPO without ray on a single device and I'm experimenting with moving some models out of the GPU. Trying
torch.nn.module.cpu()
on either actor or critic model initialised with deepspeed will not work because these models contain an optimiser which has references to the model parameters, breaking the torch moving functionality. Is there any way I could achieve moving the model without needing to destroy it and re-instantiate it?Thank you,