jieba-chinese Search Results

677 results
for jieba-chinese

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mozilla/translations #899

Investigate word-based filtering for CJK

Nikolay: Length filtering. As Chinese sentences come normally as one continuous string of characters, traditional length filtering doesn't work. Furthermore, as one word can be made of 1-4 Chinese ch…

eu9ene updated 6 days ago
1
CloudCannon/pagefind #659

Use jieba to cut Chinese directly instead charabia

Currently charabia has wrong segmentation in Chinese and Japanese #591 ,1.1.1-alpha.1 not solving problem. My native language is Chinese, and I am developing a web application. Therefore, I tried u…

ColinWttt updated 1 week ago
4
mozilla/translations #742

Support CJK in OpusCleaner

Nikolay: Chinese alphabet should be added. In general we can use a unicode ranges to do so, but they are somewhat complicated: https://stackoverflow.com/questions/43418812/check-whether-a-string-cont…

eu9ene updated 1 week ago
3
fxsjy/jieba #1027

module 'jieba' has no attribute 'enable_paddle'

Name: jieba Version: 0.42.1 Summary: Chinese Words Segmentation Utilities Home-page: https://github.com/fxsjy/jieba Author: Sun, Junyi Author-email: ccnusjy@gmail.com License: MIT ``` # enco…

HuXiLiFeng updated 2 weeks ago
1
hockyy/miteiru #64

[Bug] Youtube video showing English subtitle instead of Chin…

**Describe the bug** When opening youtube video that has 2 subs: English, Chinese (Simplified), the App shows English subs **To Reproduce** Steps to reproduce the behavior: 1. Open App 2. Choos…

notwatermango updated 2 weeks ago
4
chenyanming/paw #7

Add support for Chinese learning

Hi, I would like to use this package to help with Chinese learning. I would be willing to help with development, but might need some pointers. I would probably use `jieba` for tokenization. Please let…

jiewawa updated 1 month ago
7
quickwit-oss/tantivy-py #25

Supporting tokenizer register

Currently, the tokenizer is hard-coded to default, it would be better to include some configurable tokenizer for Chinese (tantivy-jieba and cang-jie), Japanese (lindera and tantivy-tokenizer-tiny-segm…

ghost updated 2 weeks ago
13
fastnlp/CPT #81

What are the data formats of `dataset` and `vocab` folder?

In the [README](https://github.com/fastnlp/CPT/blob/master/pretrain/README.md) of pre-training, it mentions that the `dataset`, `vocab` and `roberta_zh` have to be prepared before training. Is ther…

shivanraptor updated 2 months ago
2
MaartenGr/KeyBERT #247

Chinese documents and candidates

I'm using jieba for tokenization for my Chinese documents, as suggested here in the issues and in the documentation. It also says in the documentation that if I use a vectorizer, I cannot use a candid…

bsariturk updated 2 months ago
2
manticoresoftware/manticoresearch #931

Jieba integration

ICU is not a good choice in China. In addition, it is very important for Chinese word segmentation to customize the dictionary, because the application of words in different industries is completely d…

oabu updated 1 month ago
28

上一页 1...1 2 3 4 5 6 7...68 下一页

677 results for jieba-chinese

677 results
for jieba-chinese