-
Atilika Inc. (アティリカ株式会社) would like to donate the Kuromoji Japanese morphological analyzer to the Apache Software Foundation in the hope that it will be useful to Lucene and Solr users in Japan and el…
-
hello,
first, thank you so much for this great job
I need to use Farasa segmented and pos tagger in my process, however, it takes about 4 to 5 min to give me the results applied on 20 sentences for …
-
I am comparing the performance of the most popular lemmatization tools. I have found benchmark results for [Stanza](https://stanfordnlp.github.io/stanza/v100performance.html), [Trankit](https://tranki…
-
I have the following piece of text which I feed to pysbd.Segmenter:
```
'Trying to get back to Com. & Adm. through the most direct path in the dark.'
```
The correct way of handling this text is t…
-
Standardize `Intl.v8BreakIterator`.
Backpointers:
- https://github.com/nodejs/node/issues/3111
- https://bugs.chromium.org/p/v8/issues/detail?id=3785
Update 1 (Sept 26th, 2016):
- Proposal from @lit…
-
Dear grobid team,
I hope you are good and healthy. I'll jump straight to the problem.
**INFO**
version_used: docker image grobid/grobid:0.7.0
**PROBLEM**
For several pdfs the python g…
-
After discussing with ICU4X teams and experts from ICU, Markus suggested we should investigate a bit more on implementing the rule-based break iterator by using the approach in ICU4C. [Quote from his …
-
Hi,
I want to retrieve text by searching for an audio using [AudioClip](https://github.com/AndreyGuzhov/AudioCLIP) model.
First, I created indexing of **text** (car-horn, coughing, alarm-clock, …
-
Language : python
In this script, will create an application using tkinter GUI,
In this application user will be able to segments the paragraphs into multiple sentence.
This sentence segmen…
-
We need the following Unicode properties:
- Grapheme_Cluster_Break
- Sentence_Break
- Word_Break
- Extended_Pictographic for [GB11](https://www.unicode.org/reports/tr29/#GB11)