Open JoonasLiik opened 9 years ago
I suspect that any recursive descent parser will at least not be helped by pypy: its JIT compilation kicks in only after a loop has spun 100 times, IIRC, and these parsers are dominated by recursion, not looping. Still, we should bench and see what can be done.
Also, I'm curious whether it's the actual parsing that's slow or the NodeVisitor.visit()-ing. The visitation is full of dynamic method lookup (by name) that I could see being slow.
Good question, i'll take a closer look in a few minutes..
my most-naive-possible-benchmark yields: pypy time for parsing: 8.576436959120286 time for visiting: 9.22401708694466
python time for parsing: 6.319880450375682 time for visiting: 1.5904925605101177
You know, we could probably solve the PyPy perf problem by switching to maintaining our own stack rather than doing actual recursion. This would be a major project but an interesting one. It would also dodge the interpreter's recursion limits.
I'm benchmarking several parsing libraries, at least 3 of which use recursion, and parsimonious has strangely very poor performance for my recursion-limit test with PyPy. I'm not sure what interaction there is between Parsimonious and PyPy that would cause this, but in the end I get a VisitationError instead of a RecursionError. It could be due to the blanket except block here.
I've got a partial implementation of a PEG-based state machine (for another library) instead of recursion, and it does seem like a promising way forward, but it's quite a bit more complicated. The other libraries I test (pyparsing and pe) use recursion and do not suffer the same problem.
Update: I changed my recursion-limit-search code to catch any exception instead of just RecursionError and now the performance is similar to others. I think in my case the performance issue was the repeated formatting of very nested errors (the prettily()
function). When I catch the error and don't print it, the formatting code is not run. I wasn't seeing this before because I was killing the process before anything was printed.
when run using pypy the execution speed drops significantly.
so ehm, feels like there are some pathological cases that really dominate.