-
The UDPipe sentence splitter seems to be a bit too split-happy, creating many fragments. Is this dragging down performance of our BERT models? Furthermore, we put a lot of effort into splitting large …
-
![imagen](https://github.com/user-attachments/assets/ea8abf4e-02e4-4d4b-aba9-56f95895384a)
![imagen](https://github.com/user-attachments/assets/bcb36c3e-caec-49dc-9016-c0beef4a85ac)
- prevent this f…
-
Currently, when calculating overlap, we are first splitting previous segment by **sentences**, then try to fit as many sentences as possible into the overlap.
One should be able to configure this: ei…
-
**Is your feature request related to a problem? Please describe.**
When using the builtin tokenizer of INCEpTION, it sometimes does errors. It would be nice if the tokenization can be edited.
**De…
-
The sentence is incorrectly split after 'vol.':
> Other good editions are in **vol.** 4.
![image](https://user-images.githubusercontent.com/5816160/101890914-46469d00-3b99-11eb-9cfc-10ed918b6a1e…
-
Currently it auto-detects if lines should be the inputs (has no punctuation) or if sentences should be the inputs (has punctuation.)
This should be configurable so as to be able to better preserve…
-
Many thanks for your kind code sharing!
Could you provide the code to preprocess the data?
Or Could you give us your configurations to use CoreNLP?
Thanks again!
-
# Bug Report
## Installation Method
Installed via pip on a virtual environment.
## Environment
- **Open WebUI Version:** tested with 0.3.30 & 0.3.32
- **TTS backend:** happens with both A…
-
Hi all,
I am trying to use BlingFire for sentence splitting in Greek. As it makes many errors, I want to improve the "rules" it uses. How can I do this?
i want to try to port some rules from the…
-
Often we split up sentences after aligning into `aligned_subsegments`.
In the end we could run the pipeline again on these smaller subsegments now, get a new trellis shape and assign a new ratio base…