Closed ghost closed 6 years ago
+1 I found where is slow:
ncalls tottime percall cumtime percall filename:lineno(function)
178 0.005 0.000 108.189 0.608 .env/lib/python3.6/site-packages/pattern/text/init.py:2661(_edit2)
8344 19.272 0.002 108.163 0.013 .env/lib/python3.6/site-packages/pattern/text/init.py:2666(
we can see text/__init__.py:2661
,there is 600ms for one call
Cython can be a poor solution here: it have most power for calculation on numerical arrays (NumPy) with strong typing. For text data, it will be necessary to implement special data structures (cf. SpaCy package).
Although performance is not the main focus of this package, Pattern tends to run extremely slowly, even on a single sentence (parse takes around 10 seconds). However, if Pattern is rewritten in Cython, it could have greater performance and features, all while preserving readability.