Closed vwxyzjn closed 2 years ago
Yeah I even had a comment about it in the code :) Maybe lets remove mixed precision too? we canget rid of scaler. Anyway it doesnt help. Do you want to apply this change before or after horovod replacement?
Maybe it's probably better to merge this PR as is? We can then remove mixed-precision in a PR that involves docs change and could impact downstream projects such as isaacgymenvs.
Okay! Ill review all other prs after my work in the evening.
Also pinging @markelsanz14, are you ok with me filing a PR removing mixed-precision support? I know you were thinking about using it in the future.
@Denys88 do we know how much removing mixed precision will affect different model sizes? I agree in most cases it won't help, mainly when we have 2-5 layer NNs, but it might speed up training with larger NN sizes.
Hey @Denys88, @ViktorM @markelsanz14 and I had a quick chat about this, and maybe it's worth it to keep the mixed-precision option? I did a quick benchmark and found virtually no performance differences
However, as @markelsanz14 suggested maybe in large NNs mixed-precision could offer benefits.
Yeah, I wouldn't put more work into it for now, but removing it altogether seems a bit drastic. I feel like it can be useful in the future with larger NN architectures.
More data on this, removing the scaler seems to make it less than 5% faster (178 seconds vs 186 seconds), so it could just bit a bit of randomness.
could you check it on some heave networks, including transformers. I have my IG fork https://github.com/Denys88/IsaacGymEnvs with some fat neural networks like openai embedding from the paper or just transformer. !https://github.com/Denys88/IsaacGymEnvs/blob/main/isaacgymenvs/cfg/train/AntPPO_tr.yaml
The current implementation is problematic when not doing gradient truncation. The following filediff shows the lines that are needed to add.
But this is pretty ugly... so I refactored it to