The normalization settings of input audio

Ther-nullptr commented 2 years ago

❓ Questions and Help

Before asking:

search the issues.
search the docs.

What is your question?

In wav2vec2.0 and hubert, the config task.normalize is set to False (which means not to normalize the input audio), but data2vec is set to True, and the original paper also mentioned it. Will it have a big effect on experiment result?

Code

What have you tried?

What's your environment?

fairseq Version (e.g., 1.0 or main):
PyTorch Version (e.g., 1.0)
OS (e.g., Linux):
How you installed fairseq (pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Ther-nullptr commented 2 years ago

@alexeib

alexeib commented 2 years ago

it wont have a much of an effect, but you have to match the feature extractor to the normalization setting

normalize in dataloader -> layer norm in feature extractor no normalization in dataloader -> group norm in first block of feature extractor + feature_grad_mult = 0.1 (rescale feature extractor grads by 0.1)

facebookresearch / fairseq