birkenfeld / serde-pickle

Rust (de)serialization for the Python pickle format.
Apache License 2.0
188 stars 28 forks source link

Fix (some?) recursive objects by disabling memo ref counting #24

Open landaire opened 1 year ago

landaire commented 1 year ago

While comparing this implementation to the CPython impl I noticed that CPython does not perform ref counting of objects in its internal memo list, but this crate does. I believe that somewhere there's a bug in the ref counting logic (or maybe it's not even valid to ref count these?) that causes an object to be dropped to early from the memo list and cause errors upon subsequent lookups.

This PR adds an internal flag to just disable ref counting until this can be investigated further. I've tested this PR on data which previously returned a recursion error and it now works with correct results.

Related: #22.

birkenfeld commented 2 months ago

Sorry for coming back to this only now...

I'm not surprised that the ad-hoc refcounting I added is imperfect; it is added here purely for performance, since without it we would have to clone basically any object being unpickled, which ends up having a prohibitive cost. CPython doesn't have to do this, since its objects are all intrinsically refcounted.

The alternative is to do it like CPython - wrap all unpickled objects in an Rc like type. It could be restricted to composite types, but even large strings might benefit. But this is a huge API change, and makes handling the returned data much more fiddly.