SpeechColab / GigaSpeech

Large, modern dataset for speech recognition
Apache License 2.0
649 stars 62 forks source link

Can't locate utt2spk_to_spk2utt.pl file #121

Closed npovey closed 2 years ago

npovey commented 2 years ago

np@np-INTEL:/mnt/speech1/nadira/GigaSpeech$ ./utils/download_gigaspeech.sh /mnt/speech2/gigaspeech np@np-INTEL:/mnt/speech1/nadira/GigaSpeech$ ./toolkits/kaldi/gigaspeech_data_prep.sh --train-subset XL /mnt/speech2/gigaspeech ../data

output:

./toolkits/kaldi/gigaspeech_data_prep.sh: Extract meta into ../data/gigaspeech_corpus/
./toolkits/kaldi/gigaspeech_data_prep.sh: line 112: utt2spk_to_spk2utt.pl: command not found
./toolkits/kaldi/gigaspeech_data_prep.sh: utt2spk to spk2utt
dophist commented 2 years ago

Hi @npovey

It is exactly the same file in Kaldi, we should have copied it into this repo, but we forgot. For quick work around, you can copy it yourself so it doesn't block your work.

I don't have a local Kaldi repo for quite a while, so I might be slow to fix it. It would be really appreciated if you can open a PR to add that file into this repo fixing it, but that's totally up to you.

Thanks for letting us know.

dophist commented 2 years ago

Fixed in https://github.com/SpeechColab/GigaSpeech/pull/122, @npovey thanks! closing