-
大佬你好,我用自己的领域数据(506M txt)训练了一个kenlm,但是测下来,地址纠错效果没有你那个2.9G版本的模型好。想问问你,参数怎么设置的?
我的训练命令如下:
build/bin/lmplz \
-o 3 \
--verbose_header \
--text /kenlm/train_dataset/sj_jt_506m.txt \
…
-
Hi,
I am looking at your solution to use the JNI bridge to use Kenlm. I have problems with it not finding the kenlm libraries. I followed the example on the webpage on how to build it. The build gi…
-
Hi,
can you please guide me what all need to be changed/added in your scripts to inference with KenLM decoder?
Thanks for the docker!
-
Unable to use KenLM rescore due to missing logprobs on transcribe.
**Steps/Code to reproduce the bug**
1. Cloned the repo [7916269](https://github.com/NVIDIA/NeMo/commit/79162696ea8c48734a260dd2…
-
### Description
Use n-gram KenLM LM with Wav2Vec2 to transcribe. Refer [this](https://huggingface.co/blog/wav2vec2-with-ngram)
Read [this](https://arxiv.org/pdf/2206.12693v1.pdf)
### Completion Crite…
-
KenLM can already estimate models in a quite feasible way. Second advantage is that it is open source and we can include it in the package or download it with the installation script. It shouldn't be …
-
The [kenLM](https://github.com/kpu/kenlm) toolkit was able to train a unigram model using the Europarl dataset. However, there are currently two limitations with a unigram model (as oppose to n-gram m…
-
On certain samples, intermittently (but more likely on longer ones), when we inference with KenLM, we end up with gibberish at the end of the end of the transcript, eg.
"four one seven crivenuehiof…
-
Hello!
For some reason our 3 GB **russian** KenLM arpa model (binarized) uses **~50 GB** of RAM during CTCBeamDecoder class inizialization and estimation (100 beam width).
When using KenLM python m…
-
## 🐛 Bug
### To Reproduce
Steps to reproduce the behavior (**always include the command you ran**):
1. Run cmd python examples/speech_recognition/infer.py ~/fairseq/benchmark --task audi…