facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

The normalization settings of input audio #4687

Closed Ther-nullptr closed 2 years ago

Ther-nullptr commented 2 years ago

❓ Questions and Help

Before asking:

  1. search the issues.
  2. search the docs.

What is your question?

In wav2vec2.0 and hubert, the config task.normalize is set to False (which means not to normalize the input audio), but data2vec is set to True, and the original paper also mentioned it. Will it have a big effect on experiment result?

Code

What have you tried?

What's your environment?

Ther-nullptr commented 2 years ago

@alexeib

alexeib commented 2 years ago

it wont have a much of an effect, but you have to match the feature extractor to the normalization setting

normalize in dataloader -> layer norm in feature extractor no normalization in dataloader -> group norm in first block of feature extractor + feature_grad_mult = 0.1 (rescale feature extractor grads by 0.1)