CoEDL / elpis

🙊 software for creating speech recognition models.
https://elpis.readthedocs.io/en/latest/
Apache License 2.0
152 stars 33 forks source link

add fix_data_dir command in kaldi st5 #223

Closed benfoley closed 3 years ago

benfoley commented 3 years ago

I found when using the Gurindji Kriol corpus that a mismatch between number of lines in utt2spk and feats.scp caused training to fail. A simple remedy was to use this Kaldi command to fix the files for both the train and test data sets.

IDK what the yarn lock file is doing here.. hmmm

mattchrlw commented 3 years ago

yarn.lock is a mythical being 👻

Is there a way I can test this? Or can I take your word for it? I'm happy to approve

nicklambourne commented 3 years ago

For the sake of the git history, do you want to unstage the yarn stuff? (i.e. from your branch checkout the version of the lock file from master. If it's truly critical and not just a weird bug we can recreate the yarn changes later). 🤷‍♀️

benfoley commented 3 years ago

I've removed the yarn file.

It's a difficult error to test, it arises from using particular data. I will try and replicate with smaller dataset which I could share.