flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

Sets of tokens, lexicon, and vocab #932

Open ML6634 opened 3 years ago

ML6634 commented 3 years ago

To have sounds made by the talker, background or channel, for example, {laugh}, {cough}, {breath}, [background noise], [channel noise] etc. and those "words", in transcriptions on the list files, such as %mm, uh-huh, and %um etc. to be considered, I just need to append, for example,

{laugh} {cough} {breath} [background noise] [channel noise] %mm uh-huh %um

to the set: librispeech-train-all-unigram-10000.tokens. Right? After this, do I also need to modify lexicon and vocab or not?

In addition, in transcriptions on the list files, at the front of the name of a person or a place, there is an &, for example, &Jack, &Chicago etc. How do I take care of it? Just add

%

to the set of tokens? If so, when an audio is transcribed, may the model recognize a name and prefix it with &? Or, is it better if before the training I remove the prefix, &, from the names in transcriptions on the list files? Thank you!!

tlikhomanenko commented 3 years ago

Yes, you need to add into lexicon mapping {laugh} {laugh} so that it will map these words into the token itself and not using anything else.

In list file you can have any text, the thing you need to work is lexicon, how you map words into tokens set.