OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.74k stars 2.25k forks source link

Mixed precision for rocm #2402

Closed cspink closed 8 months ago

cspink commented 1 year ago

As I work through this example, I soon discovered the following error message:

This fp16_optimizer is designed to only work with apex.contrib.optimizers.* To update, use updated optimizers with AMP. I figured out that this makes sense, as I am using AMD HW with a ROCm implementation of Pytorch. Still, the training times I get from using one node with 8 GPUs is no where near the 10 hours reported in the configuration for 50k steps, using the same yaml file (without fp16).

This raises both broad and specific questions. To begin with the latter:

  1. How can I use mixed precision on ROCm? (And, what kind of speedup should I expect?)
  2. Broadly speaking, are there special considerations using ROCm in performance terms, which affects the choices of optimizer, batch sizes, or parallelization type?
vince62s commented 1 year ago

I never had the chance to test with a AMD GPU so I'm afraid I can't answer to those questions. Maybe on AMD Radeon communities / forums you may get some answers.