Open gabrielelanaro opened 7 years ago
@gabrielelanaro this kind of DataFrame
is not a good candidate for storage in HDF5 (as you found), but you could store it using datreant.data
as a Python object by wrapping it in something that will trigger storage as a pickle. For example, you could do:
t = dtr.Treant('/tmp/hello')
t.data['hello'] = (pd.DataFrame({ 'lists': [[0, 1, 2], [0, 1], [10, 22]] }),)
which would make the stored object a tuple and therefore it will get pickled instead of trying to cram it into an HDF5 file.
I realize pickle is a poor format for data curation (not entirely safe since deserialized objects could do nefarious things, not robust against versions of Python, etc.) but it is the lowest-common-denominator. We could consider using msgpack instead since it's often used as a substitute for pickle, but I'm not familiar with it or the arguments for it.
Happy to shift how datreant.data
works so long as we can maintain backwards compatibility for existing stores.
I'm trying to save a dataframe that contains a "series of lists" (they correspond to ionic clusters), however there is a problem with the serialization:
I found that for dataframes, the msgpack format is pretty robust and efficient, maybe we could serialize dataframes using that?
It would, however, hurt retro-compatibility