chinese-corpus Search Results

1000+ results
for chinese-corpus

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mickeysjm/HiExpan #7

Chinese entity HiExpan issues

Hi jiaming, Thanks for your idea and codes. When I run those codes in Chinese corpus, I found some issues: - First, Dependent syntax and part of speech seem to be unnecessary in corpus pro…

weather319 updated 4 years ago
1
huggingface/datasets #2396

strange datasets from OSCAR corpus

![image](https://user-images.githubusercontent.com/50871412/119260850-4f876b80-bc07-11eb-8894-124302600643.png) ![image](https://user-images.githubusercontent.com/50871412/119260875-675eef80-bc07-11e…

cosmeowpawlitan updated 3 years ago
2
cltk/cltk_api #43

Add converter.py and converted cltk_json to all CTLK corpora…

All CLTK corpora text repos need a converter.py and the subsequent converted cltk_json dir with json files that are produced by converter.py. One example of a converter.py is here: https://github.…

lukehollis updated 7 years ago
1
direct-phonology/dphon #128

explore different segmentation methods

it's an open question whether this is worthwhile or possible on old chinese, but there are certainly existing options outside of pure single-character segmentation. spaCy offers both [jieba](https://g…

thatbudakguy updated 3 years ago
1
run-llama/llama_index #13866

[Bug]: BM25Retriever cannot work on chinese

### Bug Description BM25Retriever cannot work on chinese. ### Version main ### Steps to Reproduce ```python from llama_index.retrievers.bm25 import BM25Retriever from llama_index.core import Do…

lifu963 updated 1 month ago
13
MontrealCorpusTools/Montreal-Forced-Aligner #305

ValueError: math domain error when use Dictionaries with pro…

Hi I use MFA 1.0.1 in THCHS-30 (a Chinese datasets) my lexicon are like follows: ![image](https://user-images.githubusercontent.com/24568452/124123962-7492b800-daaa-11eb-9188-2acdf0e880dc.png) my *…

yt605155624 updated 1 year ago
4
taku910/mecab #6

Problems when training.

``` Hi, first I like to thank taku ku for his awesome mecab. I'm training MeCab from scratch to make it analyse chinese sentences thanks to this website http://www.onaneet.org/blog/archives/4020, but…

GoogleCodeExporter updated 9 years ago
1
google-research/bert #780

whole word mask is not support in chinese

it seems pre-train corpus using whole word mask is not support in chinese yet. even passing --do_whole_word_mask=True using create_pretraining_data.py, nothing happens. is there someone know ho…

brightmart updated 5 years ago
3
leoncamel/mecab #6

Problems when training.

``` Hi, first I like to thank taku ku for his awesome mecab. I'm training MeCab from scratch to make it analyse chinese sentences thanks to this website http://www.onaneet.org/blog/archives/4020, but…

GoogleCodeExporter updated 9 years ago
1
gkunter/coquery #287

Reserve an entry for custom tokenizer( include POS tagging)

Hi, I search `corpus manager` and find this project. It looks very promising . I am mainly working with Chinese text, `coquery` doesn't work well with Chinese now, and I afraid it would never pla…

eromoe updated 7 years ago
1

上一页 1...4 5 6 7 8 9 10...100 下一页

1000+ results for chinese-corpus

1000+ results
for chinese-corpus