erikrose / parsimonious

The fastest pure-Python PEG parser I can muster
MIT License
1.79k stars 126 forks source link

pypy perf #75

Open JoonasLiik opened 9 years ago

JoonasLiik commented 9 years ago

when run using pypy the execution speed drops significantly.

so ehm, feels like there are some pathological cases that really dominate.

erikrose commented 9 years ago

I suspect that any recursive descent parser will at least not be helped by pypy: its JIT compilation kicks in only after a loop has spun 100 times, IIRC, and these parsers are dominated by recursion, not looping. Still, we should bench and see what can be done.

erikrose commented 9 years ago

Also, I'm curious whether it's the actual parsing that's slow or the NodeVisitor.visit()-ing. The visitation is full of dynamic method lookup (by name) that I could see being slow.

JoonasLiik commented 9 years ago

Good question, i'll take a closer look in a few minutes..

JoonasLiik commented 9 years ago

my most-naive-possible-benchmark yields: pypy time for parsing: 8.576436959120286 time for visiting: 9.22401708694466

python time for parsing: 6.319880450375682 time for visiting: 1.5904925605101177

erikrose commented 7 years ago

You know, we could probably solve the PyPy perf problem by switching to maintaining our own stack rather than doing actual recursion. This would be a major project but an interesting one. It would also dodge the interpreter's recursion limits.

goodmami commented 4 years ago

I'm benchmarking several parsing libraries, at least 3 of which use recursion, and parsimonious has strangely very poor performance for my recursion-limit test with PyPy. I'm not sure what interaction there is between Parsimonious and PyPy that would cause this, but in the end I get a VisitationError instead of a RecursionError. It could be due to the blanket except block here.

I've got a partial implementation of a PEG-based state machine (for another library) instead of recursion, and it does seem like a promising way forward, but it's quite a bit more complicated. The other libraries I test (pyparsing and pe) use recursion and do not suffer the same problem.

Update: I changed my recursion-limit-search code to catch any exception instead of just RecursionError and now the performance is similar to others. I think in my case the performance issue was the repeated formatting of very nested errors (the prettily() function). When I catch the error and don't print it, the formatting code is not run. I wasn't seeing this before because I was killing the process before anything was printed.