WorksApplications / SudachiPy

Python version of Sudachi, a Japanese tokenizer.
Apache License 2.0
392 stars 50 forks source link

Cython based optimization #123

Closed polm closed 4 years ago

polm commented 4 years ago

This incorporates cythonization of some core modules, along with some other changes, for a significant speedup. In my tests it took a benchmark from 35s to 10s.

There's still significant room for improvement, but I wanted to go ahead and get this committed.

polm commented 4 years ago

Not sure why Travis still shows this as queued, but if you click through the tests are passing.

This is related to #74.

polm commented 4 years ago

Just to clarify, I would like to go ahead and get this merged if it looks OK. I can add future improvements in a separate PR.

Let me know if there are any issues with this.

sorami commented 4 years ago

Thank you very much for the PR! Let me check, then merge and release a new version.

sorami commented 4 years ago

Released as v.0.4.6 https://github.com/WorksApplications/SudachiPy/releases/tag/v0.4.6

(The build failed, not on the PyPI yet https://pypi.org/project/SudachiPy/)

sorami commented 4 years ago

Not sure why we still get errors with Travis CI.

sorami commented 4 years ago

So Travis CI failed to upload to PyPI because now we have Cython build and bdist_wheel adds the platform tag linux_x86_64 which they don't accept.

Platform compatibility tags — Python Packaging User Guide

I've manually released the source and the wheels for macOS for now; https://pypi.org/project/SudachiPy/#files

sorami commented 4 years ago

125