ASR-project / Multilingual-PR

Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021) and WavLM (2022) pretrained on a corpus of English speech that we will use in various ways to perform phoneme recognition for different languages with a network trained with Connectionist Temporal Classification (CTC) algorithm.
209 stars 18 forks source link

Could you provide requirements.txt with version numbers? #4

Open chrispreee opened 11 months ago

chrispreee commented 11 months ago

I am trying to get this working, but the pip dependency tree is proving elusive. Could you please provide a pip freeze?

Lukysoon commented 9 months ago

Hi, this worked for me:

wandb==0.13.0
pytest==7.1.1
transformers==4.17.0
datasets==2.0.0
simple-parsing==0.0.19.post0
torch==1.10.0
pytorch-lightning==1.5.10
torchaudio==0.10.0
phonemizer==3.2.1
rich==13.0.0
librosa==0.10.0
wget==3.2
lightning-bolts==0.5.0
torchmetrics==0.7.2

I must use Ubuntu 20.

There is also problem with downloading dataset. To overcome this, I had to replace string "common_voice" with "mozilla-foundation/common_voice_13_0" because that dataset already does not exists. https://github.com/ASR-project/Multilingual-PR/blob/e7c84948f7f65d62b9b1e085487557a44dc95564/config/hparams.py#L85

And I hadto put my HuggingFace token here: https://github.com/ASR-project/Multilingual-PR/blob/e7c84948f7f65d62b9b1e085487557a44dc95564/Datasets/datamodule.py#L82