profiling/benchmarking work

marzer commented 4 years ago

This PR is the result of my efforts on closing #32. It probably looks like a lot, but it's not, really. Here's a summary:

🔹 Shuffled some code around to strengthen the boundary between definitions and implementations

Previously there was a mix of definition and implementation between the hpp and cpp files, which is likely to become a problem if/when the project grows in size, so I moved some of the heavier implementation code from pytomlpp.hpp to more specific cpp files.

🔹 Added a benchmark script with some toml test data

It's basically a slightly more complex version of the benchmarking example you have in the README. I also added tomlkit to it for the fun of it. Output looks like this:

  pytomlpp.loads:   0.13249060000000057 s 
      toml.loads:   1.1993466000000002 s
   tomlkit.parse:   7.730476 s

The test data was generated by one of the toml++ demo projects, toml_generator, and can be used to generate much, much larger toml files if you really want to stress-test :P

🔹 Hid all the profiling stuff behind a compile-time switch

As well as making it auto-report on script exit. This will mean you don't need to explicitly access anything to do with the profiling machinery from inside of python; all you need to do is set PYTOMLPP_PROFILING to 1 in pytomlpp.hpp and rebuild, then run any script using pytomlpp, and see a report on shutdown:

pytomlpp profiling summary:
-----------------------------
    loads.total:  counter =    1000, total_time_in_ns =    129455600, average_time_in_ns =  129455
  loads.convert:  counter =    1000, total_time_in_ns =     32181100, average_time_in_ns =   32181
    loads.parse:  counter =    1000, total_time_in_ns =     90427300, average_time_in_ns =   90427

You'll note I also added a scope timer for the toml::parse() section, and see that it accounts for most of the time spent in loads.

🔹 Simplified `toml_array_to_py_list` and `toml_table_to_py_dict`

They're now much less complex, using visitation instead of the previous if/else chain. They also previously had cases that were impossible to reach; a toml value can never have type node_type::none or anything not already handled by one of the other conversion paths, so those branches were dead code.

🔹 Updated toml++ to 2.1.0

I actually encountered an upstream bug while working on this that required an update, heh.

Overall the performance outcomes were negligible since it appears parsing accounts for most of the time spent, which was a bit disappointing, though considering the speed compared to tomlkit and toml I think you're OK in that department :P

bobfang1992 commented 4 years ago

This is GREAT! I will merge this now. Thanks!

marzer commented 4 years ago

No worries. I had fun working on this. Good to get out of C++-land every now and again.

You can probably close #32, methinks; Given that most time is spent parsing, the only way I see pytomlpp getting any faster is if I do upstream work to make toml::parse faster, but I already do perf passes on it quite regularly and am pretty much out of ideas (without moving to something nuts like a SIMD-based unicode decoder or something...)

Besides, you're 9x faster than toml and 59x faster than tomlkit, heh.

marzer commented 4 years ago

Oh, something I didn't add, though you may wish to consider: add CI check to ensure nobody can accidentally commit #define PYTOMLPP_PROFILING 1

bobfang1992 / pytomlpp