STT0019: Add KenLM to Wav2Vec2

OpenPecha / stt-wav2vec2

MIT License

1 stars 0 forks source link

Open spsither opened 7 months ago

spsither commented 7 months ago

Use n-gram KenLM LM with Wav2Vec2 to transcribe. Refer this Read this

Push the new Wav2Vec2+LM model to HuggingFace

spsither commented 7 months ago

Run the following from this

mkdir -p build
cd build
cmake ..
make -j 4

Use the google madlad tokenizer in preprocess.py

To train use this script

bzcat processed_texts.txt.bz2 | python preprocess.py | build/bin/lmplz -o 4 --discount_fallback > model4.arpa