clips / pattern

Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization.
https://github.com/clips/pattern/wiki
BSD 3-Clause "New" or "Revised" License
8.75k stars 1.58k forks source link

Cython version #230

Closed ghost closed 6 years ago

ghost commented 6 years ago

Although performance is not the main focus of this package, Pattern tends to run extremely slowly, even on a single sentence (parse takes around 10 seconds). However, if Pattern is rewritten in Cython, it could have greater performance and features, all while preserving readability.

zjjott commented 6 years ago

+1 I found where is slow:

ncalls tottime percall cumtime percall filename:lineno(function) 178 0.005 0.000 108.189 0.608 .env/lib/python3.6/site-packages/pattern/text/init.py:2661(_edit2) 8344 19.272 0.002 108.163 0.013 .env/lib/python3.6/site-packages/pattern/text/init.py:2666()

we can see text/__init__.py:2661,there is 600ms for one call

karelin commented 6 years ago

Cython can be a poor solution here: it have most power for calculation on numerical arrays (NumPy) with strong typing. For text data, it will be necessary to implement special data structures (cf. SpaCy package).