flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

Cpc vox populi #965

Closed Molugan closed 3 years ago

Molugan commented 3 years ago

IMPORTANT: Please do not create a Pull Request without creating an issue first. Changes must be discussed.

Original Issue: https://github.com/facebookresearch/wav2letter/issues/957

closes #[issue 957]

Summary

Patched version of Chaitanya Talnikar's implementation of masked_cpc: we needed to include the pre-training for the VoxPopuli dataset.

Test Plan (required)

Fine-tuning with Common Voices Latvian

After downloading Common Voices:

export COMMON_VOICE_DIR=[Path to the parent directory containing all common voices subset]
export WAV2LETTERDIR=[Path to wav2letter root directory]
cd prepare_data
bash build_cc_data.sh lv

You should get the following output:

Building /private/home/mriviere/Common_voices/lv/lv_grapheme.tokens
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5098/5098 [00:12<00:00, 419.89it/s]
5098 files found out of 5098
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5098/5098 [01:20<00:00, 63.61it/s]
59 speakers found
Building /private/home/mriviere/Common_voices/lv/dev.lst
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1125/1125 [00:00<00:00, 1403.27it/s]
1125 files found out of 1125
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1125/1125 [00:17<00:00, 65.65it/s]
3 speakers found
Building /private/home/mriviere/Common_voices/lv/test.lst
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1629/1629 [00:01<00:00, 1563.26it/s]
1629 files found out of 1629
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1629/1629 [00:26<00:00, 62.50it/s]
54 speakers found
Building /private/home/mriviere/Common_voices/lv/train.lst
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2336/2336 [00:01<00:00, 1679.79it/s]
2336 files found out of 2336
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2336/2336 [00:34<00:00, 68.01it/s]
2 speakers found

Download and uncompress the checkpoint from https://dl.fbaipublicfiles.com/voxpopuli/wav2letter_100k_small.tar.gz

To fine-tune the model:

cd scripts_voxpopuli
bash train_lang.sh PATH_DIR_CHECKPOINT lv
Molugan commented 3 years ago

The current code is dependent on a specific commit of flashlight which is not on the master branch. I suggest to add the missing classes (forwardSequentialModuleWithPadMaskForCPC and CPCSpecAugment) here directly to facilitate the compatibility.

tlikhomanenko commented 3 years ago

Added fixes for the build with respect to the recent changes

Molugan commented 3 years ago

We have updated the code to make it compatible with the latest flashlight release.

The pretraining and the fine-tuning are working.

But the older checkpoints are no longer compatible, we are relaunching the training;

I suggest that we merge with the new pretrained base model, and add the fine-tuned version on another PR. What do you think ?

tlikhomanenko commented 3 years ago

We have updated the code to make it compatible with the latest flashlight release.

The pretraining and the fine-tuning are working.

But the older checkpoints are no longer compatible, we are relaunching the training;

* the base pretrained model will be ready tomorrow (no fine-tuning, unsupervised training only)

* the auxiliary fine-tuned models will be ready later this week or next week

I suggest that we merge with the new pretrained base model, and add the fine-tuned version on another PR. What do you think ?

Sounds good!

facebook-github-bot commented 3 years ago

@jacobkahn has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 years ago

@Molugan has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot commented 3 years ago

@tlikhomanenko has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.