DeNA / HandyRL

HandyRL is a handy and simple framework based on Python and PyTorch for distributed reinforcement learning that is applicable to your own environments.
MIT License
281 stars 42 forks source link

Suggestion: AMP support #200

Open digitalspecialists opened 3 years ago

digitalspecialists commented 3 years ago

It would be useful to have native amp mixd precision support for those with limited GPU https://pytorch.org/docs/stable/amp.html

ikki407 commented 3 years ago

Thank you for your suggestion! I haven't used it before. How effective is it? Do you have any ideas for implementation?

digitalspecialists commented 3 years ago

AMP is support for Automatic Mixed Precision. It was made native to torch 1.6 in mid-2020 by replacing Nvidia's Apex extension two years after it was introduced. I try to use it wherever I can for practical needs. I haven't tried to patch HandyRL with it.

Pytorch declares some benchmarks here [0]. They say accuracy impact is generally <0.1% on standard benchmarks and speed up impact is generally 1.5-2x. You can usually almost double batch sizes. That accords with my experience using a 2080 TI.

Adding AMP is straightforward.

The minimal steps are

Declare a scaler for the life of the session

# Create a GradScaler once at the beginning of training.
scaler = torch.cuda.amp.GradScaler()

Wrap all forward calls with the autocaster and scales losses

# Runs the forward pass with autocasting.
with torch.cuda.amp.autocast():
    outputs = self.model(inputs)
    loss = self.criterion(outputs, targets)

# Scales loss.  Calls backward() on scaled loss to create scaled gradients.
scaler.scale(loss).backward()

# Unscale the gradients of the optimizer's params. then calls optimizer.step()
scaler.step(self.optimizer)

# Updates the scale for next iteration
scaler.update()

Pytorch provide some examples here [1]. If you accumulate gradients, or modify gradients before they are scaled, or work with DataParellel, there are additional nominal steps.

Implementations sometimes offer USE_PARALLEL and USE_AMP as configurable parameters for users.

This is just a suggestion. With RL most users would probably be CPU rather than GPU bound. For those with 32+ cores and 1 GPU, it may be useful.

[0] https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/ [1] https://pytorch.org/docs/stable/notes/amp_examples.html

ikki407 commented 3 years ago

Thanks for the details. That is awesome! Through your comments, I feel that it would be convenient to use amp as an config option. Although some verifications are required, it doesn't seem difficult to introduce. I think we need to be careful when using DataParallel and clip_grad_norm (but I'm worried that it will be complicated a bit ...) We, HandyRL team, will consider the AMP support.

Thanks