Open GoogleCodeExporter opened 9 years ago
I've certainly brought this kind of thing up in the past, but I think the fact
that universe objects are closely tied to open files makes it difficult to
allow proper serialization via pickle. This also makes it challenging to pass
universe objects between cores in a parallel workflow. I generally end up
adjusting my workflow to produce something more tractable like a numpy array of
coordinates and pickle that for storage and / or interprocess communication.
Original comment by tyler.je.reddy@gmail.com
on 27 Mar 2014 at 10:14
The way Universes are built at the moment makes it impossible to pickle them as
they contain trajectory reader objects which in turn contain open file
descriptors.
This won't change any time soon unless someone comes up with a smart way to do
this. Therefore I am closing this with 'WontFix' – but feel free to start a
discussion on the developer mailing list or in the comments to this issue. If a
sensible approach and consensus emerges we will reopen.
Oliver
Original comment by orbeckst
on 27 Mar 2014 at 10:16
Original comment by orbeckst
on 27 Mar 2014 at 10:16
Just to clarify the cryptic error message:
pickle checks whether a class has a __getstate__ function, and then executes it
if it does. The way this is done is a sort of duck-typing, where
object.__getstate__ is assigned to a variable, which is subsequently called.
If there is no __getstate__ an AttributeError is raised, in which case pickle's
default behavior ensues.
The thing with some MDAnalysis onjects is that they manage their attributes and
never raise an AttributeError. In particular if you try AtomGroup.__getstate__
you get a SelectionError which pickle does not handle; if you do the same with
Segment.__getstate__ you get no error (!!) and an empty AtomGroup is returned.
It is this last case that causes the 'TypeError: 'AtomGroup' object is not
callable' when pickle tries to execute what it got for Segment.__getstate__.
This is a dangerous side-effect of the syntactic sugar for selection shortcuts.
A lot of things can go silently ignored.
On the topic of parallelization, I'll post soon some code I have geared
specifically for parallelizing trajectory reads. The serialization of
MDAnalysis objects only becomes a problem if they are to be passed back and
forth between workers. If the code avoids that (and takes care of renewing file
descriptors) multiprocessing works fine.
Original comment by manuel.n...@gmail.com
on 27 Mar 2014 at 10:29
Is that possible to extract all the fields of the universe object, and
serialize them to byte stream. When we wanna regenerate this object, we
deserialize all the fields and construct the object.
Original comment by charlesz...@gmail.com
on 27 Mar 2014 at 11:18
That is pickle's default approach. MDAnalysis universes are unpicklable in this
way due to, among other things, open file descriptors and unpicklable function
objects (SWIG stuff).
This can all be managed instead of leaving pickle to do its default process.
Look into the __getstate__ and __setstate__ functions. There probably is a way
of coercing the serialization of enough information for __setstate__ to
recreate the universe. As far as I went it looked messy and, in my case, not
worth the trouble.
Original comment by manuel.n...@gmail.com
on 27 Mar 2014 at 11:34
For __getstate__():
* all the arguments of Universe()
* current frame in the trajectory (often that will just be frame 1)
* optional: index of the XTC/TRR reader
For __setstate__():
1) build the Universe again
2) optional: recreate the XTC/TRR reader index
3) go to the saved frame
Original comment by orbeckst
on 27 Mar 2014 at 11:42
I added explicit __setstate__() and __getstate__() methods to Universe and
AtomGroup which raise a NotImplementedError. This should at least make this
particular error message clearer (in 0.8.2-dev, commit
4bb36e64e8d182226c9303777a166bdb56d08b34)
Original comment by orbeckst
on 15 Apr 2014 at 8:25
Maybe it would be worthwhile making pickling work along the lines of recreating
a copy of the universe using the constructor information and information from
the TrajectoryReader (which would need its own __getstate__/__setstate__) —
see comments in this issue for more details.
I reopen as an enhancement and anyone interested can grab the ticket.
Original comment by orbeckst
on 15 Apr 2014 at 8:29
There is a _dcd_c_ptr attribute inside the DCDReader object. It is a C object
defined in dcd.c, if we wanna serialize the DCDReader object, we have to
serialize this c pointer as well. But it seems like pickle cannot dumps a
pyobject.
Original comment by charlesz...@gmail.com
on 15 Apr 2014 at 6:54
I don't think that we should serialize a Reader wholesale but instead provide a
way to re-instantiate. In this way you only serialize state information such as
- filename
- number of atoms
- current frame
- ... all other parameters set in __init__()
and then essentially create a brand new Reader with this information.
Original comment by orbeckst
on 15 Apr 2014 at 7:08
If we wanna initialize a reader object, we must call the _read_dcd_header()
inside the initialize function which is implemented by c in dcd.c and it will
set a new attribute named __dcd_c_ptr which is a PyObject in the reader object.
This attribute is used by function like MDAnalysis.analysis.align.rmsd(A, B)
and most other functions. So I think if we wanna serialize DCDReader we will
have to serialize this attribute or we can just serialize the coordinate and
topology file for regenerating universe object later.
Original comment by charlesz...@gmail.com
on 15 Apr 2014 at 7:54
Come to think it will be difficult to pickle the exact state of a Universe,
e.g. if I renamed atoms or changed coordinates.
What is quite do-able (I think) is to re-create the Universe in the same way as
it was created in the first place as you just need to know the constructor
arguments (namely, topology files and coordinate files and any keyword args).
This might or might not be sufficient for many applications but it certainly is
not true serialization of the current state of an object.
Original comment by orbeckst
on 16 Apr 2014 at 8:53
Original issue reported on code.google.com by
charlesz...@gmail.com
on 27 Mar 2014 at 9:18