ContextLab / hypertools

A Python toolbox for gaining geometric insights into high-dimensional data
http://hypertools.readthedocs.io/en/latest/
MIT License
1.83k stars 160 forks source link

saving a DataGeometry object #155

Closed andrewheusser closed 7 years ago

andrewheusser commented 7 years ago

After performing an analysis and visualizing the result, we want to save out the geo so that it can be shared or loaded in at a later time. After a little research, here are a few options:

To summarize, I don't see an elegant way to solve the cross-version (python 2/3) saving issue. So, unless we convert all the models to json and then rebuild them, we are stuck with pickle. My choice would be to go with joblib, which is like pickle but more efficient at handling large array data, and just note that you can't create a file one version of python and save it in the other.

jeremymanning commented 7 years ago

I think hdf5 objects can be saved with hd5py...

Another option would be .mat files, which can be saved and loaded using script.

jeremymanning commented 7 years ago

(I don't think breaking compatibility is a good idea if we can avoid it)

andrewheusser commented 7 years ago

I don't think python class instances can be saved with hdf5..i believe they first have to be converted to a dictionary, but i could be wrong. i think this would be ok if we had a simple class instance to save, but the fact that the DataGeometry.reduce/align/normalize/cluster fields can be scikit-learn class instances and custom written functions (where we have no idea what kinds of data structures are being utilized) makes it tricky.

there is a library called deepdish that can help to convert class instances into dictionaries to then be saved in the hd5 format. however, it looks like you have to build the class instance -> dictionary functions yourself, and we are supporting a lot of different classes (e.g. all the reduce models, cluster models, custom transforms).

http://deepdish.readthedocs.io/en/latest/io.html#class-instances

andrewheusser commented 7 years ago

here's another possibly useful solution: https://github.com/jsonpickle/jsonpickle

converts python objects to json

andrewheusser commented 7 years ago

^ the advantage to converting to json is that it is not only (python) cross-version compatible, but many other languages can handle json.

andrewheusser commented 7 years ago

went with hd5, closing this issue!