-
Hi jiaming,
Thanks for your idea and codes. When I run those codes in Chinese corpus, I found some issues:
- First, Dependent syntax and part of speech seem to be unnecessary in corpus pro…
-
![image](https://user-images.githubusercontent.com/50871412/119260850-4f876b80-bc07-11eb-8894-124302600643.png)
![image](https://user-images.githubusercontent.com/50871412/119260875-675eef80-bc07-11e…
-
All CLTK corpora text repos need a converter.py and the subsequent converted cltk_json dir with json files that are produced by converter.py.
One example of a converter.py is here: https://github.…
-
it's an open question whether this is worthwhile or possible on old chinese, but there are certainly existing options outside of pure single-character segmentation. spaCy offers both [jieba](https://g…
-
### Bug Description
BM25Retriever cannot work on chinese.
### Version
main
### Steps to Reproduce
```python
from llama_index.retrievers.bm25 import BM25Retriever
from llama_index.core import Do…
-
Hi I use MFA 1.0.1 in THCHS-30 (a Chinese datasets)
my lexicon are like follows:
![image](https://user-images.githubusercontent.com/24568452/124123962-7492b800-daaa-11eb-9188-2acdf0e880dc.png)
my *…
-
```
Hi, first I like to thank taku ku for his awesome mecab.
I'm training MeCab from scratch to make it analyse chinese sentences thanks to
this website http://www.onaneet.org/blog/archives/4020, but…
-
it seems pre-train corpus using whole word mask is not support in chinese yet.
even passing --do_whole_word_mask=True using create_pretraining_data.py, nothing happens.
is there someone know ho…
-
```
Hi, first I like to thank taku ku for his awesome mecab.
I'm training MeCab from scratch to make it analyse chinese sentences thanks to
this website http://www.onaneet.org/blog/archives/4020, but…
-
Hi, I search `corpus manager` and find this project. It looks very promising .
I am mainly working with Chinese text, `coquery` doesn't work well with Chinese now, and I afraid it would never pla…