-
This would be useful when transcribing to a text document because having the text divided into paragraphs makes it more readable. This may be outside the scope of this project. Just thought I would as…
-
Is the part of speech rule of Sudachi compatible with any sentence segmenter POC rules?
If no, is there any part of speech table like https://www.unixuser.org/~euske/doc/postag/
It would be helpfu…
-
When dealing with a long statement of facts quoted from legal text, the text is not split up within left double quotations and write double quotations. this is different than the " characterI cannot …
-
**Describe the bug**
A clear and concise description of what the bug is.
**To Reproduce**
input_str = """This is part 3 of MAMI-san's hair timelineThe previous hair timelines can be found hereOka…
-
Probably one for @kermitt2 :)
I found some very long text elements in the sentence level json files (like > 3000 characters).
e.g.,
file:line "quote to search"
PMC4176174.json:1502 "3D re…
-
I read Japanese and looking up words is a bit touch and go. Sometimes it works great, sometimes it comes up with nonsense.
Japanese is tough because words are not separated with space. [MeCab](http…
-
I've run `segmenter.py train` successfully with just `conllu` files in the workspace but when I include the raw text from the 2018 shared task as `raw_train.txt` and `raw_dev.txt`, I get
```
Traceba…
-
Was having trouble with over 10% of my Whisper transcriptions getting through successfully. Problems with unicode encoding, or periods were included and considered end-of-sentence markers, when they…
-
[ ] I have checked the [documentation](https://docs.ragas.io/) and related resources and couldn't resolve my bug.
**Describe the bug**
【faithfulness.adapt(language="chinese") is no useful】
Ra…
-
Hi @bowbowbow, thanks a lot for putting this together. Was wondering if it will be easy to extend the content in main.py to support Arabic.
In my initial trials, I tried the following:
1) Create…