Speech-Lab-IITM / CCC-wav2vec-2.0

Code for the method proposed in the paper:- ccc-wav2vec 2.0: Clustering aided Cross-Contrastive learning of Self-Supervised speech representations
MIT License
14 stars 3 forks source link

ccc-wav2vec 2.0

Paper Title: CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-Supervised Learning of Speech Representations. At IEEE SLT 2022 (arxiv link).

ccc-wav2vec 2.0 is a pre-training mechanism which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy.

Primary Contributions:

SUPERB Benchmark

The ccc-wav2vec 2.0 BASE model pre-trained on LibriSpeech-960h has been evaluated on the multiple downstream tasks over the SUPERB benchmark. The proposed method comprehensively outperforms the baseline wav2vec 2.0 BASE model over the array of downstream tasks presented over SUPERB.

Models

The WERs specified are without the use of any language model.

Model Pre-training data Fine-tuning data Model Links WER (test-clean \ test-other)
wav2vec 2.0 Base LibriSpeech-360h No fine-tuning fairseq | huggingface ---
wav2vec 2.0 Base LibriSpeech-360h LibriSpeech-100h fairseq | huggingface 12.8 | 31.7
ccc-wav2vec 2.0 Base LibriSpeech-360h No fine-tuning fairseq | huggingface ---
ccc-wav2vec 2.0 Base LibriSpeech-360h LibriSpeech-100h fairseq | huggingface 10.8 | 27.7
ccc-wav2vec 2.0 Base LibriSpeech-960h No fine-tuning fairseq | huggingface ---
ccc-wav2vec 2.0 Base LibriSpeech-960h LibriSpeech-100h fairseq | huggingface 5.5 | 12.4
ccc-wav2vec 2.0 Base SUPERB LibriSpeech-960h No fine-tuning fairseq SUPERB model | huggingface SUPERB model ---

Requirements and Installation

git clone https://github.com/Speech-Lab-IITM/CCC-wav2vec-2.0
cd fairseq
pip install --editable ./
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

Parameters of interest

Reference Code

  1. Facebook AI Research Sequence-to-Sequence Toolkit written in Python. fairseq