Open digitalspecialists opened 3 years ago
Thank you for your suggestion! I haven't used it before. How effective is it? Do you have any ideas for implementation?
AMP is support for Automatic Mixed Precision. It was made native to torch 1.6 in mid-2020 by replacing Nvidia's Apex extension two years after it was introduced. I try to use it wherever I can for practical needs. I haven't tried to patch HandyRL with it.
Pytorch declares some benchmarks here [0]. They say accuracy impact is generally <0.1% on standard benchmarks and speed up impact is generally 1.5-2x. You can usually almost double batch sizes. That accords with my experience using a 2080 TI.
Adding AMP is straightforward.
The minimal steps are
Declare a scaler for the life of the session
# Create a GradScaler once at the beginning of training.
scaler = torch.cuda.amp.GradScaler()
Wrap all forward calls with the autocaster and scales losses
# Runs the forward pass with autocasting.
with torch.cuda.amp.autocast():
outputs = self.model(inputs)
loss = self.criterion(outputs, targets)
# Scales loss. Calls backward() on scaled loss to create scaled gradients.
scaler.scale(loss).backward()
# Unscale the gradients of the optimizer's params. then calls optimizer.step()
scaler.step(self.optimizer)
# Updates the scale for next iteration
scaler.update()
Pytorch provide some examples here [1]. If you accumulate gradients, or modify gradients before they are scaled, or work with DataParellel, there are additional nominal steps.
Implementations sometimes offer USE_PARALLEL and USE_AMP as configurable parameters for users.
This is just a suggestion. With RL most users would probably be CPU rather than GPU bound. For those with 32+ cores and 1 GPU, it may be useful.
[0] https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/ [1] https://pytorch.org/docs/stable/notes/amp_examples.html
Thanks for the details. That is awesome! Through your comments, I feel that it would be convenient to use amp
as an config option. Although some verifications are required, it doesn't seem difficult to introduce. I think we need to be careful when using DataParallel and clip_grad_norm (but I'm worried that it will be complicated a bit ...) We, HandyRL team, will consider the AMP support.
Thanks
It would be useful to have native amp mixd precision support for those with limited GPU https://pytorch.org/docs/stable/amp.html