facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.43k stars 6.4k forks source link

"--print-alignment" argument drastically slows down the generation #2234

Open ladler0320 opened 4 years ago

ladler0320 commented 4 years ago

🐛 Bug

Using the --print-alignment argument makes the generation up to 3x times slower (Depends on the batch size). For example, generating translation for my test set took 47.9s (134.54 sentences/s, 2485.45 tokens/s) with --print-alignment option and 20.5s (315.09 sentences/s, 5820.76 tokens/s) without it.

The issue does not occur on earlier fairseq versions (I use fs-0.8.0 from some October or November commit)

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

  1. Generate the translation without --print-alignment option
  2. Generate the translation with --print-alignment option
  3. Compare the performance.

Expected behavior

--print-alignment argument won't drastically slow down the generation

Environment

kalyangvs commented 4 years ago

Reason . The speed does not vary when generated on CPU (with and without flag). Hence moved the tensors to CPU. But this is unnoticed because I might have tried lower batch-sizes. Is this due to the time taken to move tensors from GPU to CPU? If so, should extract hard_alignments computation should be made GPU-friendly. @myleott

ladler0320 commented 4 years ago

Reason . The speed does not vary when generated on CPU (with and without flag). Hence moved the tensors to CPU. But this is unnoticed because I might have tried lower batch-sizes. Is this due to the time taken to move tensors from GPU to CPU? If so, should extract hard_alignments computation should be made GPU-friendly. @myleott

@gvskalyan, thanks for the reply. The drop in generation speed on GPU is noticeable on commits after moving tensors to CPU as well as before it. Even on smaller batches, like 32, the speed is ~1.7x lower with --print-alignment option. However, you are right, there is no difference in generation speed with --cpu flag.