-
https://pypi.org/project/thai-segmenter/
I found this project that segments Thai sentences into their words.
I just tried it and it works fine.
Would it be possible for you to add it? I would d…
ghost updated
2 years ago
-
Can be divided by a word, not by words`,
Such as :
猴子爱吃香蕉。split as 猴,子,爱,吃,香,蕉, not split as 猴子 ,爱吃 ,香蕉
-
Hi, and congrats on your great work with Grobid.
I managed to set up Grobid with the docker image you provide on this [guide](https://grobid.readthedocs.io/en/latest/Grobid-docker/#crf-and-deep-learn…
-
There are different ways byte pair encoding can be applied.
1. **Apply on stream**: Apply bpe on the text stream. So for this case space ' ' and newline '\n' are considered as regular character. And …
-
Hi,
Congrats on the great work. I've been using your trained model to classify arguments in over 800 news in both spanish and portuguese, I further modified the original MUL_main_Infer.py code to m…
-
### Code
```Rust
pub fn break_sentence(s: &str) -> Vec {
use icu_segmenter::SentenceBreakSegmenter;
let provider = icu_testdata::get_provider();
let segmenter = SentenceBreakSeg…
-
I'n trying to implement a custom tokenizer, but to get start positions of each token. Let me use an exact copy of your SegtokTokenizer to explain what's my issue:
```
from flair.data import Sentence…
-
**Problem description:**
1、Search any word in Non-En language especially CJK language sentence , only words at the beginning of a sentence can be retrieved.
like
```
这里利用的是维基百科的结构化知识,加上后两…
-
Hi apologies if this is documented - I've looked at current and past issues as well and the only reference I could find is #90 but there doesn't seem to be an explanation. For reference this is the or…
-
Checking the Python files in NLTK with "python -m doctest" reveals that many tests are failing. In many cases, the failures are just cosmetic discrepancies between the expected and the actual output, …