-
Currently as per my understanding `Truncate` component is not intended to be used for a text that does not have spaces between words.
This is how the component behaves with a text with missing spac…
-
**Describe the bug**
After we fixed sentence segmentation, there are still bugs in Negex that need to be fixed.
**Additional context**
Examples:
> 1.No evidence of pneumonia.
- Comments - 1.No …
-
- Useful when discussing issues.
-
Hi! I am using the nllb models for the first time and I am having some trouble for making tranlations of complete documents. I am following the same structure as the hugginface tutorial (https://huggi…
-
```
What steps will reproduce the problem?
1. upload a pre-segmented .tmx in Rainbow
2. start a Rainbow pipeline
3. add a step Segmentation and selected the source and target .srx rules
4. choose ´Kee…
-
Hi, my collegues and I have released [UD-Kanbun](https://github.com/KoichiYasuoka/UD-Kanbun), a python-based tokenizer, POS-tagger, and dependency-parser for classical Chinese texts. And now we are in…
-
目前测试结果,文本长度超过510时会报错,例如:“床前明月光,疑是地上霜。举头望明月,低头思故乡。床前明月光,疑是地上霜。举头望明月,低头思故乡。床前明月光,疑是地上霜。举头望明月,低头思故乡。床前明月光,疑是地上霜。举头望明月,低头思故乡。床前明月光,疑是地上霜。举头望明月,低头思故乡。床前明月光,疑是地上霜。举头望明月,低头思故乡。床前明月光,疑是地上霜。举头望明月,低头思故乡。床前明月光,疑…
-
## Description of the bug
I'm trying to Train a model that can build a Knowledge Base from the OPC UA Companions specification as a part of my Thesis.
I have the Dataset as PDFs and used a third-par…
-
### [140\. Word Break II](https://leetcode.com/problems/word-break-ii/)
Difficulty: **Hard**
Given a **non-empty** string _s_ and a dictionary _wordDict_ containing a list of **non-empty** wor…
-
> 因为我们目前发现在之前的工作其实是繁简混合的“古文”,存在不严谨之处,所以正在训练纯粹的繁体古文模型,工作正在进行中,如果有进展再行告知。
不知進展如何? 能否協助提供 “古文” 的 checkpoint 供測試? 謝謝。