Closed fabiankrieg closed 6 years ago
Insight of the day:
Seems like the only member cython auto-pickling keeps failing upon is Matrix._info
which is a packed INFO_ARR_s
. Purpose of this is to maintain a space to store general information (shape, type, stuff like that) is a way subclassing won't interfere by duplicating these entries for each subclass. Currently, I am thinking of lifting that struct to a class itself and then implement pickling for this subtype. However, this would potentially lose the benefit of next-to-nothing fast access to exactly that general information. Also one has to proceed with caution as _info
also holds data type pointers that are only valid for one session. But that would be stuff of that particular pickling routine, wouldn't it?
tl;dr: looks like the rabbit hole is not that deep as it seemed to be.
Maybe this enhancement would even come with the benefit of not having to deal with subclasses as long as these themselves only store basic (or picklable) data types. Hoorray! A toast to the cython guys, you rock!
Newsflash! Pickle support for fastmat classes was just introduced to 0.1.2. Could you please check if everything work out fine for the use cases you described and let me know if there are any issues left with the current implementation? Please note that you need to have cython>=0.26 installed in order for pickling to work.
Feature request
Problem
Fastmat offers a nice set of features for efficiently dealing with structured and sparse and whatever matrices. Now, some users might create pretty advanced matrices which take time to compute, using the several fastmat classes as containers to allow fast products. Storing these for later use (to disk) is not straight forward.
Solution
I did some research on the topic but got no thorough solution yet.
First idea: Make fastmat Pickle-able
As I'm a Python-newby I was also new to pickle. I learned that pickles allows pretty convenient serialization of python objects for e.g. file IO. I also made up a small example which was pretty convenient to implement. Consider some class like this:
now we use this as:
But how to store it to disk? To do so, we have to tell pickle how to pickle, which means, that we have to provide a
__reduce__()
function forSomeBlockMatrix
. This function returns the name of the class, s.t. pickle can instantiate a new object of that class upon loading. Furthermore, it returns a tuple of arguments that are passed to the constructor of the class, s.t. an object of the same content is initialized by pickleThis pretty much did it, we can now write and load this to disk, hence, every
item
is pickable itself:Note
When I tried to pickle some Cython-stuff like fastmat matrices which have no pickle interface yet I always run into Seg-Faults. There was no warning message as it will occur for pure Python stuff.
Dill instead of Pickle
https://pypi.python.org/pypi/dill
I got some
IOError
s when I did call my pickling function to save a file from a different module than the load function was residing at. The corresponding module was not found. There are some hints, e.g. in the discussion of https://stackoverflow.com/questions/2121874/python-pickling-after-changing-a-modules-directory, that this might not be the case withdill
, as this directly serializes the objects. Not tested by me thoughFurther reads
Numpy is much faster at storing/loading matrices than pickle: https://github.com/mverleg/array_storage_benchmark
Security issues of pickle: https://www.synopsys.com/blogs/software-security/python-pickling/
More on Dill vs. pickle: https://stackoverflow.com/questions/33968685/pickle-yet-another-importerror-no-module-named-my-module
Harsh cornercases with pickles on Linux, unpickling on Windows https://github.com/uqfoundation/dill/issues/218