Open spsither opened 7 months ago
Run the following from this
mkdir -p build
cd build
cmake ..
make -j 4
Install using this reference
Use the google madlad tokenizer in preprocess.py
To train use this script
bzcat processed_texts.txt.bz2 | python preprocess.py | build/bin/lmplz -o 4 --discount_fallback > model4.arpa
Description
Use n-gram KenLM LM with Wav2Vec2 to transcribe. Refer this Read this
Completion Criteria
Push the new Wav2Vec2+LM model to HuggingFace
Implementation Plan
Subtasks