Closed drscotthawley closed 2 years ago
Hello Scott and thanks for reaching out and sorry for the delayed response.
Alright, it seems that TIMIT cannot be freely downloaded from HuggingFace Hub anymore and you have to download it manually (with payment!).
Unfortunately I couldn't test this code using Librispeech as it was too large for my machine and network bandwidth! but your problem seems to be an incompatibility between number of features in the output of your model and the number of features passed to the loss function. Can you put a breakpoint in the trainer.py
's train_step()
right before calculating loss and check the shape of x
and y
at line 60?
@drscotthawley Hello Scott, I hope you're doing well. Just reached out to ask if your issue is resolved or not?
Aryan, thank you very much for sharing your code with the world. I wonder if you could advise:
I am trying to train by following the instructions for audio, but I haven't been able to get TIMIT or LibriSpeech to work.
TIMIT
For TIMIT, I get the message from HuggingFace that it must be downloaded manually. From the URL provided in the message, I got to UPenn who apparently want $250? for the dataset?? ...So, ok, I obtained a copy from a friend and also from Kaggle. But in both cases the HF dataloader fails; it is looking for files that don't exist anywhere in the dataset: it is looking for files with lower-case letters like "*test" (all the filenames in both my copies are uppercase) and certain file extensions that exclude the .DOC which is provided in TIMIT:
Error message
The files look like
If I take away the
'clean'
directive in theload_dataset
call, then the dataset loads but fails later with a key error:If I print out
self.data
just after it's loaded in your TIMIT class, there is no 'audio' part to the namespace:Are you able to comment or advise about getting TIMIT to work?
LibriSpeech
For LibriSpeech, I copied your TIMIT class in
dataset.py
and just hard-coded the name of the dataset:And then in
trainer.py
I just wroteIn that case the data is loaded without errors, and the training begins but aborts with a series of CUDA errors:
Do you have a suggestion about getting LibreSpeech working?
Thanks again, Soctt