BUG: demo module return nan for model other than wav2vec2 variants in finetuning

Although we can fill pretrained_model using other than wav2vec2 variants, and the training process success, the demo using fine-tuned model causes nan in the logits. I have experienced this in the past and thought it had been solved.

Example of data

Training stage

$ python3 -m nkululeko.nkululeko --config test_bagus/exp_emodb_finetune_wavlm_base.ini 
DEBUG: nkululeko: running finetuned_wavlm_base_7 from config test_bagus/exp_emodb_finetune_wavlm_base.ini, nkululeko version 0.88.11
DEBUG: experiment: value for type not found, using default: audformat
...
{'loss': 0.5462, 'grad_norm': 1.6567870378494263, 'learning_rate': 4.5454545454545455e-06, 'epoch': 12.0}                                                       
{'eval_loss': 1.0810825824737549, 'eval_UAR': 0.75, 'eval_ACC': 0.8014705882352942, 'eval_runtime': 0.5392, 'eval_samples_per_second': 252.217, 'eval_steps_per_second': 9.273, 'epoch': 12.57}                                                                                                                                 
{'loss': 0.5417, 'grad_norm': 1.396345615386963, 'learning_rate': 0.0, 'epoch': 12.57}                                                                          
{'train_runtime': 50.9932, 'train_samples_per_second': 87.58, 'train_steps_per_second': 0.431, 'train_loss': 0.8906121253967285, 'epoch': 12.57}                
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 22/22 [00:50<00:00,  2.32s/it]
DEBUG: model: saved best model to /tmp/results/finetuned_wavlm_base_22/models/run_0/torch
DEBUG: reporter: value for name is not found, using default: emodb_emotion_finetune                 
DEBUG: reporter: plotted epoch progression to /tmp/results/finetuned_wavlm_base_22/./images/run_0/emodb_emotion_finetune_epoch_progression.png
DEBUG: modelrunner: run: 0 epoch: 22: result: test: 0.769 UAR
DEBUG: modelrunner: plotting confusion matrix to emodb_emotion_finetune_0_022_cnf
DEBUG: reporter: Saved confusion plot to /tmp/results/finetuned_wavlm_base_22/./images/run_0/emodb_emotion_finetune_0_022_cnf.png
DEBUG: reporter: Best score at epoch: 0, UAR: .768, (+-.723/.818), ACC: .816
DEBUG: reporter: labels: ['anger', 'sadness', 'neutral', 'happiness']
DEBUG: reporter: result per class (F1 score): [0.833, 0.294, 0.941, 0.982] from epoch: 22
WARNING: experiment: Save experiment: Can't pickle the trained model so saving without it. (it should be stored anyway)
DEBUG: experiment: Done, used 194.498 seconds
DONE

Inference/demo

python3 -m nkululeko.demo --config test_bagus/exp_emodb_finetune_wavlm_base.ini --file data/test/audio/03a
...
Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.
/home/bagus/miniconda3/envs/nkululeko/lib/python3.9/site-packages/torch/nn/functional.py:5076: UserWarning: Support for mismatched key_padding_mask and attn_mask is deprecated. Use same type for both instead.
  warnings.warn(
ERROR: demo: NaN value in pipeline output for file: data/test/audio/03a01Nc.wav

It seems that demo module from fine-tuning only works for wav2vec2 variants, including audmodel. Previously it works on other models, e.g. the following model: wavlm_finetuned_emodb.

Test demo module for finetune model type:

model	works?
default (wav2vec2 robust)	yes
hubert-large-ll60k	no
wavlm-base	no
wavlm-base-plus	no
wavlm-large	no
audeering	yes

So, although there are no errors during the training process (nkululeko.nkululeko) and the performance score for the test set can be obtained, maybe some configuration for Hubert and wavlm differs from wav2vec2 variants (or other library updates/upgrades).

felixbur / nkululeko

BUG: demo module return nan for model other than wav2vec2 variants in finetuning #150

Example of data