Wrong generation with Iterative Product Quantization

🐛 Bug

I followed instructions to use the Iterative Product Quantization provided here: (https://github.com/pytorch/fairseq/tree/master/examples/quant_noise) I succeeded to create a transformer model using the commands below although it seems that not all parameters are actually quantized.

But the main problem arises when I tried generation, because I get a dummy output.

To Reproduce

Steps to reproduce the behavior:

preprocessing

fairseq-preprocess --source-lang en --target-lang it   --trainpref fairseq_bpefied/EPv8_en__it.train --validpref fairseq_bpefied/EPv8_en__it.valid --testpref fairseq_bpefied/EPv8_en__it.test     --destdir fairseq_data-bin/EPv8.en__it --joined-dictionary

training

I run these two commands sequentially:

fairseq-train fairseq_data-bin/EPv8.en__it    ${train_options}      --save-dir fairseq_model_PQ  --tensorboard-logdir fairseq_tensorboard_logdir_PQ --quant-noise-pq 0.1 --quant-noise-pq-block-size 8     --max-update 30000 

fairseq-train fairseq_data-bin/EPv8.en__it    ${train_options}      --save-dir fairseq_model_PQ  --tensorboard-logdir fairseq_tensorboard_logdir_PQ --quant-noise-pq 0.1 --quant-noise-pq-block-size 8     --max-update 45000    --quantization-config-path transformer_quantization_config.yaml

Note: the transformer_quantization_config configuration file is the default provided in the instructions. Note: here are the train_options:

train_options="--task translation --share-all-embeddings --no-progress-bar --dataset-impl mmap -no-progress-bar --warmup-updates 400 --max-tokens 1024 --arch transformer --clip-norm 0.0 --label-smoothing 0.1 --attention-dropout 0.1 --dropout 0.3 --weight-decay 0.0 --criterion label_smoothed_cross_entropy --optimizer adam --adam-betas '(0.9, 0.98)' --log-interval 100 --lr 0.0005 --lr-scheduler inverse_sqrt --min-lr 1e-09 --warmup-init-lr 1e-07 --update-freq 4 --save-interval-updates 1000 --keep-interval-updates 10 --keep-last-epochs 10”

interactive generation

fairseq-interactive fairseq_data-bin/EPv8.en__it/ --path fairseq_model_PQ/checkpoint_best.pt --beam 5 --source-lang en --target-lang it --bpe subword_nmt --bpe-codes fairseq_bpe.codes

Errors:

Error 1

During training only the following layers are quantized

quantized layers: ['decoder.layers.0.fc1', 'decoder.layers.0.fc2', 'decoder.layers.1.fc1', 'decoder.layers.1.fc2', 'decoder.layers.2.fc1', 'decoder.layers.2.fc2', 'decoder.layers.3.fc1', 'decoder.layers.3.fc2', 'decoder.layers.4.fc1', 'decoder.layers.4.fc2', 'decoder.layers.5.fc1', 'decoder.layers.5.fc2']
quantized layers: ['encoder.embed_tokens']

whereas in the config file self_attn parameters should be quantized as you can see here:

layers_to_quantize:
      - decoder\.layers\.\d+\.fc[12]
      - (encoder|decoder)\.embed_tokens
      - decoder\.layers\.\d+\.self_attn\.(k_proj|v_proj|q_proj|out_proj)

Error 2

During generation I got this dummy output, when I enter this input My name is Nicola

My name is Nicola
S-0 My name is Nic@@ ola
W-0 0.062   seconds
H-0 -5.2093610763549805 in in in in in
D-0 -5.2093610763549805 in in in in in
P-0 -5.1419 -5.1419 -5.1419 -5.1419 -5.1419 -5.5468

### Expected behavior
Translating with a non-quantized transformer model, trained in a comparable way (same parameters, but quantization ones), I got the "right" output

My name is Nicola S-0 My name is Nic@@ ola W-0 0.087 seconds H-0 -0.2353324592113495 Il mio nome è Nic@@ ola . D-0 -0.2353324592113495 Il mio nome è Nicola . P-0 -0.4744 -0.0869 -0.0734 -0.2561 -0.4708 -0.0012 -0.4113 -0.1085

Environment

fairseq Version: 0.10.1
PyTorch Version: 1.7.0
OS (e.g., Linux): 18.04.5 LTS
How you installed fairseq (pip, source): pip
Python version: 3.6.9
CUDA/cuDNN version: 11.0
GPU models and configuration: GeForce RTX 2080 Ti

facebookresearch / fairseq