-
Hello, I'd like to compute perplexity on different text corpuses given an ngram computed with kenlm. I found in some old issues that `--vocab_pad` param should be used with a big number in similar sit…
-
Is there a way to do ngram estimation with custom token separation? The idea would be to get the following behavior:
`Hi, this is a sentence.` -> `Hi`, `,`, `this`, `is`, `a`, `sentence`, `.`
`My em…
-
I'm writing a Python script that mimics the behavior of lmplz.
When I tested it out on a large corpus, I found the estimated probabilities differed slightly from lmplz's output.
By shrinking the c…
-
i'm interested in using the kenlm LM to decode/score outputs of my speech recognition model.
when I initiate my CTCBeamDecoder with model_path='./test.arpa', which is a pretty small .arpa file just…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
Yes
### Source
source
### TensorFlow version
tensorflow==2.15.0.post1
### Custom code
Yes
### OS platform and dist…
-
I am trying to estimate language model on Raspbian. I got a segmentation fault when running `kenlm/build/bin/lmplz -o 4 --prune 0 1 2 3 --limit_vocab_file vocab.txt --interpolate_unigrams 0 lm.arpa`.
…
-
## 🐛 Bug
Tensorboard writers are not cleared between hydra configurations
### To Reproduce
This problem was spotted while running training of Wav2Vec-U with default parameters.:
```PREFIX=…
-
-
I'm adding a note here, although this is not really an 'issue' in the normal sense.
I just checked in code that supports enforcing min-counts. This should make the process of building and pruning LM…
-
### Question
Dear Sirs,
I am currently using the Python binding for Flashlight, working with the LexiconDecoder and KenLM classes to build a decoder for an ASR model I have. I currently call the d…