datreant / datreant.data

convenient data storage and retrieval in HDF5 for Treants
http://datreant.org/
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Pickle dumps not readable between python 2/3 #22

Open kain88-de opened 6 years ago

kain88-de commented 6 years ago

The problem is that we always write using the highest available protocol. But I was unaware that python3 introduced a new protocol version

https://stackoverflow.com/questions/25843698/valueerror-unsupported-pickle-protocol-3-python2-pickle-can-not-load-the-file#25843743

To have pickles load across python version and be reasonable fast/small we have to explicitly choose version 2 of the protocol.

richardjgowers commented 6 years ago

What are we using pickles for?

kain88-de commented 6 years ago

python data pickles. For example if we store python objects.

richardjgowers commented 6 years ago

Right yeah, but what is datreant using pickle for? git grep pickle isn't showing me anything?

kain88-de commented 6 years ago

https://github.com/datreant/datreant.data/blob/89f8937dfc35cc966738bf14d89dacb3bf906a5a/src/datreant/data/pydata.py#L41

Sure you are on the right repository?

richardjgowers commented 6 years ago

Ah right yeah, I was looking at core

kain88-de commented 6 years ago

Oh well with string handling this is even more broken -.-. We have to check reading/writing with the bytes flags rb/wb to ensure data can be read/written independent of the python version.

kain88-de commented 6 years ago

This is to say. I now also encountered that a python2 dump cannot be read by python 3

kain88-de commented 6 years ago

https://bugs.python.org/issue6784

So they claim this is resolved. But I honestly haven't found a good solution today to enable reading / writing of pickles between python versions that works problem free. Some simple stiff and builtin types might work but not complex objects.