I wish to try out stage 3 without Adam offloading, but I would imagine you would have to use multiple nodes to just hold the actor's weights, I have briefly tried to set the number of GPU per actor node to 16 but to no avail, so I assume this is not yet supported? And I wonder if this is technically feasible with the OpenRLHF ray-based framework so I can spend some time looking into it.
I wish to try out stage 3 without Adam offloading, but I would imagine you would have to use multiple nodes to just hold the actor's weights, I have briefly tried to set the number of GPU per actor node to 16 but to no avail, so I assume this is not yet supported? And I wonder if this is technically feasible with the OpenRLHF ray-based framework so I can spend some time looking into it.