-
Either something like [thai segmenter](https://pypi.org/project/thai-segmenter/) or maybe [sentence piece](https://github.com/google/sentencepiece).
-
I am currently using Ragas to evaluate my RAG application, which is built using llama index . I've encountered a few issues in the generated results:
1- When generating queries using `TestsetGenera…
-
In the following example
https://arxiv.org/pdf/2103.12028v1.pdf
there are cases of wrong sentence segmentations, with sentence offsets apparently shifted by a few characters, resulting in word c…
-
### Describe the bug
When passing a list of custom split sentences using a custom split function, the TTS model (`tts_models/multilingual/multi-dataset/xtts_v2` to be specific) with `split_sentence…
-
When I run the ebook on @piscosour, I get: http://pastebin.com/atBRUjR5
```
/task/__gems__/gems/punkt-segmenter-0.9.1/lib/punkt-segmenter/punkt/sentence_tokenizer.rb:81:in `split_in_sentences': undef…
-
The python lib [pragmatic_segmenter](https://github.com/diasks2/pragmatic_segmenter) has a list of 50+ sentence split examples that this lib fails to parse. You can use [their list](https://github.com…
-
**Bug description**
I am using `segment_long` to segment a relatively long paragraph, and despite using this specific function I am getting a repeated warning that reads
> WARNING:root:Consider u…
-
**Describe the bug**
Segmenter will raise "exception: bad escape (end of pattern) at position" when it is initialized with clean=True and it encounters a sentence like "etc.Png,Jpg,.\\" (word/token t…
-
This just popped up for me for the first time. Running the `recognize` function (with whisper.cpp, built with OpenBLAS, on CPU) on what is, as far as I know, not a pathological audio sample (it's an a…
-
Is the part of speech rule of Sudachi compatible with any sentence segmenter POC rules?
If no, is there any part of speech table like https://www.unixuser.org/~euske/doc/postag/
It would be helpfu…