dkpro / dkpro-cassis

UIMA CAS processing library written in Python
https://pypi.org/project/dkpro-cassis/
Apache License 2.0
85 stars 22 forks source link

Can not add annotations to characters not right next to punctuation marks for Chinese #287

Closed fishfree closed 1 year ago

fishfree commented 1 year ago

For Chinse / Japanese / Korean, there is no space between words. I select the 4 characters which are a whole word as the screenshot below, but the whole sentence is automatically selected. image

I'm totally confused. Where to configure or modify to correct it. The text is: 从现在起到本世纪中叶,全面建成社会主义现代化强国、全面推进中华民族伟大复兴,是全党全国人民的中心任务,强国建设、民族复兴的接力棒,历史地落在我们这一代人身上,3月13日,北京人民大会堂,十四届全国人大一次会议上,习近平总书记的重要讲话引发雷鸣般的掌声。

reckart commented 1 year ago

This is actually an INCEpTION question, not a Cassis question.

In INCEpTION, you go to the Layer panel in the Project settings. Then you select the layer that you want to use and change its Granularity from Token-level to Character-level.

Screenshot 2023-04-17 at 07 40 10
fishfree commented 1 year ago

Yes. After posting this issue, it occured to me it's the question of INCEpTION. And I found the answer you post in another issue. Thank you very much! Now it's working as expected.