OpenPecha / stt-wav2vec2

MIT License
1 stars 0 forks source link

STT0019: Add KenLM to Wav2Vec2 #2

Open spsither opened 7 months ago

spsither commented 7 months ago

Description

Use n-gram KenLM LM with Wav2Vec2 to transcribe. Refer this Read this

Completion Criteria

Push the new Wav2Vec2+LM model to HuggingFace


Implementation Plan

Subtasks

spsither commented 7 months ago

Run the following from this

mkdir -p build
cd build
cmake ..
make -j 4

Install using this reference

Use the google madlad tokenizer in preprocess.py

To train use this script

bzcat processed_texts.txt.bz2 | python preprocess.py | build/bin/lmplz -o 4 --discount_fallback > model4.arpa