DLR-RM / stable-baselines3

PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
https://stable-baselines3.readthedocs.io
MIT License
9.03k stars 1.69k forks source link

[Feature Request] Half Precision support #344

Open jkterry1 opened 3 years ago

jkterry1 commented 3 years ago

🚀 Feature

Native support for using FP16 GPU computations, via a flag to .learn or something like that.

Motivation/ Pitch

Using half precision instead of single is common practice in deep learning because it dramatically speeds up computation and is typically perfectly acceptable. All major libraries, including PyTorch, support it with feature parity to single precision. Notably for reinforcement learning, observations encoded in FP16 will also be much small (so you can have bigger replay buffers).

While FP16 is problematic on consumer grade cards, a ton of people renting a GPU from a cloud service like AWS/GCP/Azure or using Collab notebooks which use Tesla GPUs with fullfledgedd FP16 support.

Alternatives

Not applicable here

 Checklist

araffin commented 3 years ago

Hello, the feature request seems reasonable, however:

LSC527 commented 2 years ago

According to Low-Precision Reinforcement Learning, directly apply mixed precision tranining in RL will reduce the performance. Have you tried using mixed precision tranining in RL? Is there a performance drop?

buoyancy99 commented 2 years ago

I'd like to work on a PR for this. From my experience in the past, fp16 for RL is okay and can increase iteration speed of research significantly. Some questions about design choice.

Some questions about design choice:

  1. Shall we pass the FP16 training option as arguments for learn() and train()
  2. Shall we support multi-gpu training in the same PR (very useful for RL from pixels)
  3. Should I add a warning upon fp16 training, telling people performance may be lowered, especially when reward/value scale is very big or very small?
  4. Shall I implement some tricks in https://arxiv.org/abs/2102.13565 or shall I add those tricks incrementally after a vanilla mixed precision training is supported?
Miffyli commented 2 years ago

@buoyancy99

Thanks for reviving this topic! I start by a disclaimer that I am sceptical about the benefits, but would be happy to be proven wrong :)

1) I do not know how the mechanics work underneath. To me, it sounds like it should be passed in the model creation part. (Sidethought: I realize not everything can be done in FP16 because of precision errors, but I wonder if it would be worth it to also store data in FP16 in off-policy models to have more data 🤔 )

2) Bit sceptical about this (usually batch/image sizes are small), but would be happy to be proven wrong! This should go to an another PR, though :) (one thing at a time)

3) Such warnings are the core of SB3 (assist people to know when things are silently breaking up), so yes! Come think of it, it might be worth to warn about high losses in general (thought for other time).

4) Incrementally, please. You may experiment with them of course, and if there is some huge benefit to be gained then naturally we can think about adding it :)

Note that we prioritize code readability and cleanliness over every last bit of performance.

buoyancy99 commented 2 years ago

@buoyancy99

Thanks for reviving this topic! I start by a disclaimer that I am sceptical about the benefits, but would be happy to be proven wrong :)

  1. I do not know how the mechanics work underneath. To me, it sounds like it should be passed in the model creation part. (Sidethought: I realize not everything can be done in FP16 because of precision errors, but I wonder if it would be worth it to also store data in FP16 in off-policy models to have more data thinking )
  2. Bit sceptical about this (usually batch/image sizes are small), but would be happy to be proven wrong! This should go to an another PR, though :) (one thing at a time)
  3. Such warnings are the core of SB3 (assist people to know when things are silently breaking up), so yes! Come think of it, it might be worth to warn about high losses in general (thought for other time).
  4. Incrementally, please. You may experiment with them of course, and if there is some huge benefit to be gained then naturally we can think about adding it :)

Note that we prioritize code readability and cleanliness over every last bit of performance.

I agree with you training optimizations are not very important for state based rl. However, I find it very useful for pixel based RL, especially when you keep the batch size as large as the default param for many off policy algorithms.

Miffyli commented 2 years ago

Feel free to whip up a PR including the comparative results (training speed ups, impact on agent performance) :). Especially if not too many part of the code changes it can be merged.

kjabon commented 2 years ago

Hi @buoyancy99 (or anyone who might be working on it), Have you made any progress on this? I may be interested in having a go myself, but don't want to re-do work already done.

buoyancy99 commented 2 years ago

I tried some auto mixed precision with sb3 but it severely hurts the performance