wav2vec and vq-wav2vec code in master won't converge

faniuy commented 4 years ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

Run train.py with exact same arguments specified in examples/wav2vec/README.md vq-wav2vec stops converging after a few thousand updates, where the loss is around 4.x, the prob_perplexity and code_perplexity stop at 2.0. wav2vec loss drops significantly slower than the v0.9.0 tag in the same environment and dataset, and uses about half the Vram compared to the v.0.9.0 tag for the same settings.

Code sample

Expected behavior

Loss should converge to 0.x as code in v0.9.0 tag

Environment

fairseq Version (e.g., 1.0 or master): master
PyTorch Version (e.g., 1.0): 1.4
OS (e.g., Linux): CentOS 7.7
How you installed fairseq (pip, source): github clone & pip install -e .
Build command you used (if compiling from source):
Python version: 3.7.4
CUDA/cuDNN version: 10.1
GPU models and configuration: Telsa P100 16GB
Any other relevant information:

Additional context

zelabean commented 4 years ago

I use another language and it similar to me. my ppl is not converged about hyper parameters that I tried and also author used.

zelabean commented 4 years ago

@faniuy code_prob's best is maybe 2.

result["code_perplexity"] = torch.exp( -torch.sum(hard_probs * torch.log(hard_probs + 1e-7), dim=-1) ).sum()

Best ppl of code is 1 but the number of group is 2.

faniuy commented 4 years ago

@zelabean But it shouldn't be 2.0 just after a few thousand updates. Somethings' not right. And what puzzles me most is that the wav2vec in the master consumes half the Vram compared to wav2vec in tag v0.9.0 for the same settings. That is weird. I don't know what happened yet. Too many code changes since 0.9.0.

faniuy commented 4 years ago

Seems like the projection layers in gumbel vector quantizer consumed all the gradient, become over-parameterized. Setting vq-depth to 1 makes things a little better since the prob perplexity no longer drops straight to 2.0. But I still can't get the loss down.

zelabean commented 4 years ago

@faniuy Thanks. now, I using vq-depth 1 and no learning rate annealing with low learning rate. if result good, I will announce you

facebookresearch / fairseq