Closed ambigus9 closed 6 years ago
(Copying my comment from the other thread)
wav2letter/data/librispeech/create.lua
is the file used for preprocessing. You can take a look how it works on librispeech and create something similar for your dataset.
On a high level, let's say your test folder is abc
, the first sample will be store something like this
abc/000000001.flac
- audio file in flac format. Note that it have 1 channel. If you are using a pretrained model, sampling freq should be 16KHz since the model is trained on librispeech.
abc/000000001.ltr
- specify the target for the sample. For example the transcription is "hello world", you would store "h e l l o | w o r l d" in the file.
abc/000000001.fid
- which store a unique id for the sample
abc/000000001.spk
- speaker info
and for second sample, filename should be 000000002.zzz
....
PS: .fid
, .spk
are not used for training, testing. You can keep some dummy values in there if you want to do a quick test.
I don't understand Why I must create a file containing the transcription of the audio If I want to get a transcription of the same audio? It's like to say: "The sum of 1 and 2 is 3, please Calculate the sum of 1 and 2."
This is what I don't understand:
abc/000000001.ltr - specify the target for the sample. For example the transcription is "hello world", you would store "h e l l o | w o r l d" in the file.
I want the transcription of the file abc/000000001.flac which means result.txt ("hellow world") , Using a Pretrained model I will get the transcription of that audio.
@vineelpratap Currently this are the steps I'm following
After Installation is complete
# Getting LibriSpeech dataset
wget http://www.openslr.org/resources/12/dev-clean.tar.gz
tar xfvz dev-clean.tar.gz
# Removing all audiofiles except first folder
mv LibriSpeech/dev-clean/1272/128104/ LibriSpeech/
rm -r LibriSpeech/dev-clean/*
mkdir -p LibriSpeech/dev-clean/1272
mv LibriSpeech/128104 LibriSpeech/dev-clean/1272
# Adding my own audio file
cp my-own-audio.flac LibriSpeech/dev-clean/1272/128104/
mv LibriSpeech/dev-clean/1272/128104/my-own-audio.flac LibriSpeech/dev-clean/1272/128104/1272-128104-0015.flac
nano LibriSpeech/dev-clean/1272/128104/1272-128104.trans.txt
## Adding following line to the last line of transcription file
1272-128104-0015 HI
luajit /wav2letter/data/librispeech/create.lua /LibriSpeech /librispeech-proc
luajit /wav2letter/data/utils/create-sz.lua librispeech-proc/dev-clean
cat /librispeech-proc/letters.lst >> /librispeech-proc/letters-rep.lst && echo "1" >> /librispeech-proc/letters-rep.lst && echo "2" >> /librispeech-proc/letters-rep.lst
## Modify letters list and add z letter
nano /librispeech-proc/letters.lst
# Running test
luajit /wav2letter/test.lua /librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir /librispeech-proc/ -dictdir /librispeech-proc/ -gfsai
This are the results
<|hi|>
[Sentence WER: 2200.00%, dataset WER: 016.07%]
[=================== 16/16 ===================>] Tot: 1s48ms | Step: 69ms
| dev-clean LER = 08.96%, WER = 16.07%
Currently I'm trying to understand the workflow to infer transcriptions of my persona files using a Pre-trained model. However I don't know where I must to put my personal .flac files and if is needed to renamed it. Any clue will be apreciated.