5% accuracy on expresso test set

Hello,

Thank you for your work -- it is a very nice.

I ran the classify_audio.py (specified in https://github.com/facebookresearch/textlesslib/tree/main/examples/expresso) for on the expresso test set and I am getting Accuracy: 5.10 % (30/588 correct).

see log below

sh
Reading manifest file /home/ramon/data/proc/expresso/dev.tsv
1194 files found from the manifest file!
Some weights of the model checkpoint at /home/ramon/models/textless/expresso were not used when initializing Wav2Vec2ForSequenceClassification: ['wav2vec2.encoder.pos_conv_embed.conv.weight_g', 'wav2vec2.encoder.pos_conv_embed.conv.weight_v']
- This IS expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing Wav2Vec2ForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of Wav2Vec2ForSequenceClassification were not initialized from the model checkpoint at /home/ramon/models/textless/expresso and are newly initialized: ['wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original0', 'wav2vec2.encoder.pos_conv_embed.conv.parametrizations.weight.original1']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Classification model loaded from /home/ramon/models/textless/expresso !
Performing audio prediction...
100%|█████████████████████████████████████████████████████████████████| 1194/1194 [00:30<00:00, 38.64it/s]
...done!
Wrote the predictions to /home/ramon/data/proc/expresso/dev.predictions
Accuracy: 4.02 % (48/1194 correct)

facebookresearch / textlesslib

5% accuracy on expresso test set #36