flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

How to get transcriptions of personal audio files? #77

Closed ambigus9 closed 6 years ago

ambigus9 commented 6 years ago

Currently I'm trying to understand the workflow to infer transcriptions of my persona files using a Pre-trained model. However I don't know where I must to put my personal .flac files and if is needed to renamed it. Any clue will be apreciated.

vineelpratap commented 6 years ago

(Copying my comment from the other thread) wav2letter/data/librispeech/create.lua is the file used for preprocessing. You can take a look how it works on librispeech and create something similar for your dataset. On a high level, let's say your test folder is abc, the first sample will be store something like this abc/000000001.flac - audio file in flac format. Note that it have 1 channel. If you are using a pretrained model, sampling freq should be 16KHz since the model is trained on librispeech. abc/000000001.ltr - specify the target for the sample. For example the transcription is "hello world", you would store "h e l l o | w o r l d" in the file. abc/000000001.fid - which store a unique id for the sample abc/000000001.spk - speaker info

and for second sample, filename should be 000000002.zzz ....

PS: .fid, .spk are not used for training, testing. You can keep some dummy values in there if you want to do a quick test.

ambigus9 commented 6 years ago

I don't understand Why I must create a file containing the transcription of the audio If I want to get a transcription of the same audio? It's like to say: "The sum of 1 and 2 is 3, please Calculate the sum of 1 and 2."

This is what I don't understand: abc/000000001.ltr - specify the target for the sample. For example the transcription is "hello world", you would store "h e l l o | w o r l d" in the file.

I want the transcription of the file abc/000000001.flac which means result.txt ("hellow world") , Using a Pretrained model I will get the transcription of that audio.

ambigus9 commented 6 years ago

@vineelpratap Currently this are the steps I'm following

After Installation is complete

# Getting LibriSpeech dataset
wget http://www.openslr.org/resources/12/dev-clean.tar.gz
tar xfvz dev-clean.tar.gz

# Removing all audiofiles except first folder
mv LibriSpeech/dev-clean/1272/128104/ LibriSpeech/
rm -r LibriSpeech/dev-clean/*
mkdir -p LibriSpeech/dev-clean/1272
mv LibriSpeech/128104 LibriSpeech/dev-clean/1272

# Adding my own audio file
cp my-own-audio.flac LibriSpeech/dev-clean/1272/128104/
mv LibriSpeech/dev-clean/1272/128104/my-own-audio.flac LibriSpeech/dev-clean/1272/128104/1272-128104-0015.flac
nano LibriSpeech/dev-clean/1272/128104/1272-128104.trans.txt

## Adding following line to the last line of transcription file
1272-128104-0015 HI

luajit /wav2letter/data/librispeech/create.lua /LibriSpeech /librispeech-proc
luajit /wav2letter/data/utils/create-sz.lua librispeech-proc/dev-clean

cat /librispeech-proc/letters.lst >> /librispeech-proc/letters-rep.lst && echo "1" >> /librispeech-proc/letters-rep.lst && echo "2" >> /librispeech-proc/letters-rep.lst

## Modify letters list and add z letter
nano /librispeech-proc/letters.lst

# Running test
luajit /wav2letter/test.lua /librispeech-glu-highdropout.bin -progress -show -test dev-clean -save -datadir /librispeech-proc/ -dictdir /librispeech-proc/ -gfsai

This are the results

 <|hi|>
 [Sentence WER: 2200.00%, dataset WER: 016.07%]
  [=================== 16/16 ===================>] Tot: 1s48ms | Step: 69ms      
 | dev-clean LER = 08.96%, WER = 16.07%