-
When parsing a bit of text with accents, I get the wrong tokens. For instance, with the code:
```
tokens %
plot_tree(token, lemma, POS)
```
![image](https://user-images.githubuserco…
-
请教一个问题,从您的文档中,我知道可以透过自定义词典的方式来加入新词(TXT或代码中直接加入),经实际测试,自定义的新词都可以被辨识的很好。但想请教一个更根源的应用问题,假设待处理的文档源源不断的进来,则可以透过什麽方式来发现新词呢?
比方您的文档提到,CRF发现新词能力较强,我做了一个实验,输入带有「理都懂,然并卵,城会玩,日了狗」等网路语言,我发现这些人民日报语料库里不大可能出现的词,都不会被…
-
推荐一个腾讯AI实验室最近release的中文embeddings,不太清楚他们的corpus是
I would like to recommend recently released pre-trained, Chinese-based, word embeddings. I don't quite know the corpus they trained the embeddings on,…
-
As we all know, chinese NLP research has been slowed down by inavailability of large open-source corpus, and this issue has become more and more severe due to the recent advances of large pre-trained …
-
MSRA 和 人民日报的数据 好像是一样的?
-
**You must follow the issue template and provide as much information as possible. otherwise, this issue will be closed.
请按照 issue 模板要求填写信息。如果没有按照 issue 模板填写,将会忽略并关闭这个 issue**
## Check List
Thanks…
-
I found useful tools to get the relation underneath the texts on page [KBPAnnotator](https://stanfordnlp.github.io/CoreNLP/kbp.html#example-usage)
But there is no example of Java API, so I cannot dir…
-
**Hi everone! I want to use snorkel to process some chinese text,so i have to use stanfordcorenlp,but these error occured. what should I do**
NewConnectionError Traceback (m…
-
I implemented a Linear Chain CRF layer for sequence tagging tasks inspired by the paper:
Lample et al. Neural Architectures for Named Entity Recognition (Neural Architectures for Named Entity Recogni…
-
After start the server, I add the custom dict in the properties, such as
` "segment.serDictionary": "edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz,path/custom_segment_dict.txt",
…