Open jkterry1 opened 3 years ago
Hello, the feature request seems reasonable, however:
According to Low-Precision Reinforcement Learning, directly apply mixed precision tranining in RL will reduce the performance. Have you tried using mixed precision tranining in RL? Is there a performance drop?
I'd like to work on a PR for this. From my experience in the past, fp16 for RL is okay and can increase iteration speed of research significantly. Some questions about design choice.
Some questions about design choice:
@buoyancy99
Thanks for reviving this topic! I start by a disclaimer that I am sceptical about the benefits, but would be happy to be proven wrong :)
1) I do not know how the mechanics work underneath. To me, it sounds like it should be passed in the model creation part. (Sidethought: I realize not everything can be done in FP16 because of precision errors, but I wonder if it would be worth it to also store data in FP16 in off-policy models to have more data 🤔 )
2) Bit sceptical about this (usually batch/image sizes are small), but would be happy to be proven wrong! This should go to an another PR, though :) (one thing at a time)
3) Such warnings are the core of SB3 (assist people to know when things are silently breaking up), so yes! Come think of it, it might be worth to warn about high losses in general (thought for other time).
4) Incrementally, please. You may experiment with them of course, and if there is some huge benefit to be gained then naturally we can think about adding it :)
Note that we prioritize code readability and cleanliness over every last bit of performance.
@buoyancy99
Thanks for reviving this topic! I start by a disclaimer that I am sceptical about the benefits, but would be happy to be proven wrong :)
- I do not know how the mechanics work underneath. To me, it sounds like it should be passed in the model creation part. (Sidethought: I realize not everything can be done in FP16 because of precision errors, but I wonder if it would be worth it to also store data in FP16 in off-policy models to have more data thinking )
- Bit sceptical about this (usually batch/image sizes are small), but would be happy to be proven wrong! This should go to an another PR, though :) (one thing at a time)
- Such warnings are the core of SB3 (assist people to know when things are silently breaking up), so yes! Come think of it, it might be worth to warn about high losses in general (thought for other time).
- Incrementally, please. You may experiment with them of course, and if there is some huge benefit to be gained then naturally we can think about adding it :)
Note that we prioritize code readability and cleanliness over every last bit of performance.
I agree with you training optimizations are not very important for state based rl. However, I find it very useful for pixel based RL, especially when you keep the batch size as large as the default param for many off policy algorithms.
Feel free to whip up a PR including the comparative results (training speed ups, impact on agent performance) :). Especially if not too many part of the code changes it can be merged.
Hi @buoyancy99 (or anyone who might be working on it), Have you made any progress on this? I may be interested in having a go myself, but don't want to re-do work already done.
I tried some auto mixed precision with sb3 but it severely hurts the performance
🚀 Feature
Native support for using FP16 GPU computations, via a flag to .learn or something like that.
Motivation/ Pitch
Using half precision instead of single is common practice in deep learning because it dramatically speeds up computation and is typically perfectly acceptable. All major libraries, including PyTorch, support it with feature parity to single precision. Notably for reinforcement learning, observations encoded in FP16 will also be much small (so you can have bigger replay buffers).
While FP16 is problematic on consumer grade cards, a ton of people renting a GPU from a cloud service like AWS/GCP/Azure or using Collab notebooks which use Tesla GPUs with fullfledgedd FP16 support.
Alternatives
Not applicable here
Checklist