birkenfeld / serde-pickle

Rust (de)serialization for the Python pickle format.
Apache License 2.0
185 stars 27 forks source link

Add support for NamedTuples? #29

Closed gurglemurgle5 closed 3 weeks ago

gurglemurgle5 commented 3 weeks ago

Currently, I am trying to use this library to decode a pickle file containing classes. However, it seems like all class objects get converted to empty dicts when loading them (which is fair, as they don't come with property names, only the values). However, I still need to get the values out of them. Would it be possible to dump object properties as tuples? (This would also work really well for NamedTuples).

Example: This python object should decode to a (str, int) tuple, or ("the normal kind", 5000)

import pickle
import typing

class banana(typing.NamedTuple):
    """this is bananas"""
    type: str
    quantity: int

if __name__ == "__main__":
    ban = banana("the normal kind", 5000)
    pickle.dump(ban, open("./output.pickle", "wb"))
MahouShoujoMivutilde commented 3 weeks ago

Do NOT run it.

It's malware that'll steal your account if executed, to spread further spamming the same message elsewhere, like happened to this person.

There are a lot of comments like that

https://github.com/search?q=is%3Aissue+%22In+the+installer+menu%2C+select+%5C%22gcc.%5C%22%22+AND+%22password%3A+changeme%22&type=issues&s=created&o=desc

(and this is how I found this issue)

birkenfeld commented 3 weeks ago

Malware link hidden and reported.

birkenfeld commented 3 weeks ago

I'll see what I can do about namedtuples, that seems like a good thing to support out of the box. (Same for dataclasses, I'll check how they are currently handled.)

Can you post an example for the other classes you're talking about, so I can verify something useful is happening when unpickling them?

birkenfeld commented 3 weeks ago

OK, namedtuples as well as other instances should now keep their state if you enable the keep_restore_state option in DeOptions. If you're still missing something, please reopen.

gurglemurgle5 commented 3 weeks ago

Can you post an example for the other classes you're talking about, so I can verify something useful is happening when unpickling them?

I just checked what classes I need to handle for the project I'm working on, and it looks like it's mainly NamedTuples. The only other classes I need to handle are classes which inherit (ByValue, enum.IntEnum) and (ByValue, enum.IntFlag), where ByValue is implemented as such:

class ByValue:
    """
    Mixin for enums to pickle value instead of name (restores pre-3.11 behavior). Use as left-most parent.
    See https://github.com/python/cpython/pull/26658 for why this exists.
    """
    def __reduce_ex__(self, prot):
        return self.__class__, (self._value_, )

So it seems this feature got added before I was even able to finish this reply. Nice! I've only done a quick test with it, and the only issue I've found so far is that there are still some debugging println's left in. The ByValue mixin seems to cause the enums to be parsed as an unresolved global, which makes sense if it's intentionally messing with the pickling behaviour. Thankfully, those enums don't seem to be important to what I'm trying to do, so that's nice!

birkenfeld commented 3 weeks ago

Ok, yeah, sorry about the debug prints, they are gone now.

And I implemented another "try to save data" branch that now keeps the enum values. They are wrapped in a tuple, but that should be ok.