Closed hiroshi-matsuda-rit closed 4 years ago
This problem was reproduced on Mac OS 10.14.6, Windows10 update 1909, and WSL with python 3.8.
Thanks for the report. I found the cause of this, I screwed up in the Cythonization and connect costs were wrong. Just opened a PR with a fix.
I strongly recommend to add the spaCy evaluation step to CI tests. With spacy CLI and UD_Japanese-GSD v2.6-NE, you can do evaluations like as:
# prepare sudachipy module before executing below steps
$ pip install -U spacy sudachidict-core
$ python -m spacy download ja_core_news_md
$ python -m spacy evaluate ja_core_news_md ja_gsd-ud-test.ne.json
================================== Results ==================================
Time 1.29 s
Words 13053
Words/s 10131
TOK 98.11
POS 97.94
UAS 88.16
LAS 86.18
NER P 72.79
NER R 72.91
NER F 72.85
Textcat 0.00
The decline of TOK measure should be within 0.1%.
@hiroshi-matsuda-rit
I've merged @polm's fix and released v0.4.8.
Sorry for the degradation, yeah we should include the spaCy evaluation step in the CI #132 (or at least test with some paragraphs)
$ pip install -U sudachipy==0.4.7
$ python -m spacy evaluate ja_core_news_md ja_gsd-ud-test.ne.json
================================== Results ==================================
Time 1.10 s
Words 12817
Words/s 11630
TOK 91.93
POS 82.06
UAS 75.81
LAS 73.98
NER P 69.52
NER R 70.77
NER F 70.14
Textcat 0.00
$ pip install -U sudachipy==0.4.8
$ python -m spacy evaluate ja_core_news_md ja_gsd-ud-test.ne.json
================================== Results ==================================
Time 1.20 s
Words 13053
Words/s 10871
TOK 98.11
POS 97.94
UAS 88.16
LAS 86.18
NER P 72.79
NER R 72.91
NER F 72.85
Textcat 0.00
I just tested v0.4.8 and got the same result. Thank you for quick response!
@sorami @polm Could you research the reason of this difference between v0.4.5 and v0.4.6?