NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 372 forks source link

Making use of Language Model with CTC Decoder #545

Closed abhishek-rawat18 closed 3 years ago

abhishek-rawat18 commented 3 years ago

I've been trying to make use of the following link: https://github.com/NVIDIA/OpenSeq2Seq/tree/master/external_lm_rescore for creating a ctc decoder which makes use of language model, but I don't understand whats the "labels.csv" file they refer to in this link above.

aayushkubb commented 3 years ago

Hey,

So the external_lm_rescore is to do rescoring with a bigger LM like Transformer XL. This particular link will be helpful for you to re-evaluate the beams that are created from ctc beam search decoder.

So the first step if you want to use ctc_decoder is to build a kenlm model from: https://github.com/NVIDIA/OpenSeq2Seq/blob/master/scripts/build_lm.py

Once you have the LM then you may use ctc_decode from: https://github.com/NVIDIA/OpenSeq2Seq/blob/master/scripts/decode.py

For this, you need to have the kenlm model(google and you will find a very easy way to create kenlm)(arpa or binary format), the lables.csv(which is nothing but list of filenames, then pickle obtained from the acoustic model.

Now decode.py itself will give you ctc_decoded output. If you want more then you may try out Transformer-XL which will just re-evaluate beam scores for you.

Hope this helps, Good. Luck.!

abhishek-rawat18 commented 3 years ago

@aayushkubb Great, thanks a lot.