goodmami / python-parsing-benchmarks

Compares Python's text parsing libraries
MIT License
24 stars 5 forks source link

Add textX test (for JSON right now) #5

Closed ThatXliner closed 3 years ago

ThatXliner commented 3 years ago

Yeah sure! This is just the initial implementation I hacked up (the documentation really sucks).

goodmami commented 3 years ago

(the documentation really sucks)

Agreed. After 20 minutes of trying to fix the issues my self I gave up. I just wanted to know how to apply semantic actions to parsing events. I think it has to do with registering interpreter functions, but would it be model interpreters or object interpreters? Anyway, good luck!

ThatXliner commented 3 years ago

That was probably the 5th hackiest code I've ever written. But I'm pretty there is no other way than resort to collections.UserDict and collections.UserList

ThatXliner commented 3 years ago

What's the difference between * and *-compile (e.g. JSON and JSON-compile)? Lark is the slowest for JSON-compile but the 3rd fastest for JSON.

Also, what is the data you use to shove into term-graph? The median? The mean? It depends (on if it's skewed)?

goodmami commented 3 years ago

Some answers to your questions before I review the PR:

What's the difference between * and *-compile (e.g. JSON and JSON-compile)? Lark is the slowest for JSON-compile but the 3rd fastest for JSON.

*-compile is the time it takes to initialize the parser, which is why all the setup code goes in a compile() function. Tools that initialize the parser by parsing a grammar description (pe, Lark, Parsimonious) tend to be slower and those that are created with Python code (Sly, PyParsing) tend to be faster. It has no bearing on the speed of parsing, but a holistic view of a program that parses things might consider the initialization time.

Also, what is the data you use to shove into term-graph? The median? The mean? It depends (on if it's skewed)?

I believe I used the mean as I was trying to avoid outliers. If the deviation was too high, I'd re-run the benchmarks with fewer things running in the background. I also generally close the browser and anything that is streaming, like music, etc., before running the benchmarks. But don't worry about adding the time to the graph. I'll do that after rerunning everything. They need to be run on the same machine.

ThatXliner commented 3 years ago

Alrighty, that answers my questions. But just one more thing.

As you see with my last commit:

Screen Shot 2021-04-19 at 6 21 34 PM

I decided to use packrat parsing which would use more memory (packrat parsing is just, AFIAK, a fancy name for PEG parsing with a memoization cache)

Is memory usage a thing we're also testing for?

ThatXliner commented 3 years ago

I think that's all we need. Ready for merge

goodmami commented 3 years ago

Yep looks good. I've squashed all the commits into one and merged. There's a few small things to fix, but I'll take care of it. Thanks!

ThatXliner commented 3 years ago

You're welcome! Happy to help 😄