bobfang1992 / pytomlpp

A python wrapper for tomlplusplus
https://bobfang1992.github.io/pytomlpp/pytomlpp.html
MIT License
86 stars 11 forks source link

feature: add support for io operations from filenames #52

Closed neutrinoceros closed 3 years ago

neutrinoceros commented 3 years ago

fix #51 opening as a draft for now, I'll need to add tests

There's room for discussion over the exact implementation for this since no matter how it is handled, it will always add a slight overhead to the existing behaviour, either as an additional conditional check or a try/except. The latter solution is particularly expensive (performance wise) if an exception is typically raised more than 1% of the time, so I went with the former, which should make both load and dump perform decently in any case, to no significant cost.

A side effect is that in order to achieve this properly, I had to add support to load/dump to and from binary files.

marzer commented 3 years ago

RE performance concerns: given my findings in #42 I'd be surprised it came up on a profiler regardless of which route you go, since the vast majority of CPU time is spent in C++ (toml::parse)

neutrinoceros commented 3 years ago

Nice, this is a relief ! However the patch ended up increasing the complexity of your io functions much more than I anticipated, no hard feelings if you don't want it for this reason :)

marzer commented 3 years ago

Had a quick look at your changes and they look fine to me, but ultimately it's up to @bobfang1992 since the python lib is his project, I'm just the shadowy C++ spectre looming in the shadows :)

One concern I have is the addition of the encoding parameter; note that the TOML spec explicitly states that all valid TOML is UTF-8, so if any non-UTF-8 string is passed through to toml++ it will likely fail with a 'malformed UTF-8' error. Shouldn't be a problem if you can ensure that python handles all the encoding changes internally and the C++ lib never sees anything other than UTF-8.

(worth noting that I don't know how pybind11 handles strings; it's possible that it already ensures they're UTF-8?)

neutrinoceros commented 3 years ago

One concern I have is the addition of the encoding parameter; note that the TOML spec explicitly states that all valid TOML is UTF-8,

Thank you, this is an excellent point that I failed to consider.

Furthermore, quoting the spec (re: binary)

for binary data, it is recommended that you use Base64 or another suitable ASCII or UTF-8 encoding

so It seems pretty clear to me that the encoding param should not be exposed in pytomlpp's api. I'll simplify my patch accordingly.

neutrinoceros commented 3 years ago

I see that the wheels are failing to build because some of the tests that I added use pathlib.Path objects as inputs for load and dump, but the builtin open function only gained support for such object in Python 3.6 It should suffice to convert those paths to strings in the tests, let's see.

bobfang1992 commented 3 years ago

Hi I fixed this in https://github.com/bobfang1992/pytomlpp/pull/53 can you rebase?

neutrinoceros commented 3 years ago

Done ! To be clear, should I remove the third commit, or do you need to keep tests compatible with Python 3.5 ?

bobfang1992 commented 3 years ago

I see no harm keeping them the way they are. I think we can merge this if the CI passes.

neutrinoceros commented 3 years ago

Thank you !