-
### Self Checks
- [X] I have searched for existing issues [search for existing issues](https://github.com/langgenius/dify/issues), including closed ones.
- [X] I confirm that I am using English to…
-
Great job! I have a small question: I want to avoid catastrophic forgetting or the ability to handle bilingualism, such as training both Chinese and English simultaneously. Can the language be set to …
-
Execute the following code (tabooSegmentCustomDicList there are more than 2000 words)
`
for _, tabooSegmentCustomDic := range tabooSegmentCustomDicList {
lowerCaseWord := strings.ToLower(tabooSeg…
-
The sentence is just splited by character.
```
# 导入spacy并创建中文nlp对象
import spacy
nlp = spacy.blank("zh")
# 处理文本
doc = nlp("我喜欢老虎和狮子。")
# 遍历打印doc中的内容
for i, token in enumerate(doc):
…
-
Chinese and Japanese, unlike English which relies on spaces for separation, use distinct punctuation marks such as full stops (。), exclamation marks (!), and question marks (?) to denote the end of se…
-
/chat: Will LLM do word segmentation for Chinese? Or do they simply read each Chinese character and run the process?
-
**About Chinese word segmentation.**
All of document splitters extends HierarchicalDocumentSplitter class, When I set the overlap parameter,overlapFrom() is called,
But there will force method invoc…
-
ICU is not a good choice in China. In addition, it is very important for Chinese word segmentation to customize the dictionary, because the application of words in different industries is completely d…
-
![image](https://github.com/user-attachments/assets/9a919cd5-14e0-410a-aa7e-5916ee40ec27)
-
**Describe the bug**
The analyze module does not perform correct segmentation for Chinese texts. As Chinese does not have any white-space word segmentation, CATMA treats only punctuation symbols as w…