RoBERTa Scalar Quantization / Scalar Quant Noise

🐛 Bug

Scalar Quantization does not seem to work on a pretrained RoBERTa model.

To Reproduce

Script to run without quantization

TOTAL_NUM_UPDATES=2036 WARMUP_UPDATES=122 LR=2e-05 NUM_CLASSES=2 MAX_SENTENCES=4 ROBERTA_PATH=roberta_base/model.pt RTE_PATH=RTE-bin/ SAVE_DIR=checkpoint/roberta/rte-no-quant-noise UPDATE_FREQ=4

python -m fairseq_cli.train $RTE_PATH \ --restore-file $ROBERTA_PATH \ --max-positions 512 \ --batch-size $MAX_SENTENCES \ --max-tokens 4400 \ --task sentence_prediction \ --reset-optimizer --reset-dataloader --reset-meters \ --required-batch-size-multiple 1 \ --init-token 0 --separator-token 2 \ --arch roberta_base \ --criterion sentence_prediction \ --num-classes $NUM_CLASSES \ --dropout 0.1 --attention-dropout 0.1 \ --weight-decay 0.1 --optimizer adam --adam-betas "(0.9, 0.98)" --adam-eps 1e-06 \ --clip-norm 0.0 \ --lr-scheduler polynomial_decay --lr $LR --total-num-update $TOTAL_NUM_UPDATES --warmup-updates $WARMUP_UPDATES \ --max-epoch 10 \ --find-unused-parameters \ --best-checkpoint-metric accuracy --maximize-best-checkpoint-metric \ --ddp-backend legacy_ddp \ --save-dir $SAVE_DIR \ --update-freq $UPDATE_FREQ

Script to run with quantization

TOTAL_NUM_UPDATES=2036 WARMUP_UPDATES=122 LR=2e-05 NUM_CLASSES=2 MAX_SENTENCES=4 ROBERTA_PATH=roberta_base/model.pt RTE_PATH=RTE-bin/ SAVE_DIR=checkpoint/roberta/rte-no-quant-noise UPDATE_FREQ=4

Note that these two scripts are identical, except for the quant-noise-scalar argument.

Code sample

Expected behavior

We would expect that these two scripts would train differently. But instead, they train identically: the RoBERTa model in the second script is not quantized.

Environment

fairseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0): 1.7.1
OS (e.g., Linux): Linux
How you installed fairseq (pip, source): source
Build command you used (if compiling from source): python setup.py build develop
Python version: 3.7.9
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Additional context

First time submitting an issue here so apologies if anything is incorrect. Thanks for any help!

facebookresearch / fairseq