Open ladler0320 opened 4 years ago
Reason . The speed does not vary when generated on CPU (with and without flag). Hence moved the tensors to CPU. But this is unnoticed because I might have tried lower batch-sizes. Is this due to the time taken to move tensors from GPU to CPU? If so, should extract hard_alignments computation should be made GPU-friendly. @myleott
Reason . The speed does not vary when generated on CPU (with and without flag). Hence moved the tensors to CPU. But this is unnoticed because I might have tried lower batch-sizes. Is this due to the time taken to move tensors from GPU to CPU? If so, should extract hard_alignments computation should be made GPU-friendly. @myleott
@gvskalyan, thanks for the reply. The drop in generation speed on GPU is noticeable on commits after moving tensors to CPU as well as before it. Even on smaller batches, like 32, the speed is ~1.7x lower with --print-alignment option. However, you are right, there is no difference in generation speed with --cpu flag.
🐛 Bug
Using the --print-alignment argument makes the generation up to 3x times slower (Depends on the batch size). For example, generating translation for my test set took 47.9s (134.54 sentences/s, 2485.45 tokens/s) with --print-alignment option and 20.5s (315.09 sentences/s, 5820.76 tokens/s) without it.
The issue does not occur on earlier fairseq versions (I use fs-0.8.0 from some October or November commit)
To Reproduce
Steps to reproduce the behavior (always include the command you ran):
Expected behavior
--print-alignment argument won't drastically slow down the generation
Environment
pip
, source): source