-
There are many data files located here:
https://github.com/unicode-org/icu4x/tree/main/provider/datagen/data/segmenter
Is this the best place for the source of truth, or can we source them from …
-
As part of the API review with @markusicu, he pointed out something I had noticed before and we discussed on Slack, which is that LineSegmenter does not return a breakpoint at index 0.
Here is the …
-
## ❓ Questions and Help
#### What is your question?
(This is from #5283. I though it is better to separate since it is the new kind of error)
I tried to reproduce wav2vec-U 2.0 with python 3.…
-
Hi! I'm working on the Unicode-based engine [citeproc-lua](https://github.com/zepinglee/citeproc-lua) which requires conversion between sentence case and title case for titles. At the moment I'm using…
-
The type for grapheme cluster segmentation is called GraphemeClusterBreakSegmenter. I think it should be called GraphemeClusterSegmenter.
In general, the names of types should have the following pa…
-
## Description
In my hospital (CHU de Brest), ADICAP codes are written like this:
```
ADICAP :B.H.HP.A7A0
Cotations :
ZZQX217 R-AHC-100-A001 R-AHC-10-A015
```
In this case dots s…
-
Hi!
I am developing a data streaming solution for MT training data in **Python 3.10**. As I don't want it to create a bottleneck for MT training, it has to be reasonably fast. However, when benchmar…
-
Original [issue 174](https://code.google.com/p/cleartk/issues/detail?id=174) created by ClearTK on 2010-12-28T21:44:42.000Z:
I am building a uima wrapper for the java.text.BreakIterator class. This …
-
**Describe the bug**
Pysbd and Spacy both are installed in my env.
`# packages in environment at /home/vibhu/miniconda3/envs/huggfacegpu:
#
# Name Version Buil…
-
I'm running grobid-0.7.2 on Windows 11 using docker. I followed instructions from [your documentation](https://grobid.readthedocs.io/en/latest/Grobid-docker/#configure-using-the-normal-yaml-config-fil…