alexdlaird / amazon-orders

A CLI and library for interacting with Amazon order history
https://amazon-orders.readthedocs.io
MIT License
23 stars 9 forks source link

I love this project! Unable to pickle orders #13

Closed varenc closed 9 months ago

varenc commented 9 months ago

Acknowledgements

Describe the Feature

I'd like to be able to easily cache/memoize my Amazon orders. This makes sense to me since every order from any year besides the current one will essentially never change. At the moment I'm working on a personal script and am constantly re-fetching them from Amazon. I worry that doing this too much might trigger some hacking/abuse detection stuff from Amazon and they'll lock my account. Also it's just a bit slow.

I'd love it if there was a built in cache. Of course, full caching support could be a big feature, so a simpler request would just to make it so an Order object is serializable and I can cache them myself. If I try to do pickle.dump(some_order, file_handler) I get a RecursionError: maximum recursion depth exceeded error. I assume this because Orders contain some sort of circular reference though I haven't investigated why.

Describe Alternative Solutions/Workarounds

At the moment I don't have a clear workaround. If I were to approach this myself I'd probably just dig into the code and figure out how to use some sort of cache on all HTTP requests. There's caching wrappers for requests that you can configure so they cache everything (ignore normal HTTP caching rules) so that'd be one alternative approach. Though making this a first class feature is more work.

alexdlaird commented 9 months ago

That's interesting. I removed circular references quite a few versions ago—there is an adjacent reference to Items both in the Order and also in Order.shipments, though that'd be a subset (assuming multiple shipments) and that should still be picklable. However, I also just tried a json.dumps(orders), and that fails too saying Order is not serializable, so probably I need to look in to that a bit more. It used to be earlier on in development, so I must've added something that broke serialization ... at present I can't think what wouldn't be serializable on these objects, but they're fairly well nested, so it might require some sleuthing.

I don't know when I'll get a chance to look at this, but here are the objects and their associated types—this and anything in the encompassing entity package. Let me know if anything jumps out as you as an obvious issue for serializing these objects.

alexdlaird commented 9 months ago

Quick update. From my research, the issue appears to be with BeautifulSoup, apparently its Tag objects are not pickable. Every Parsable object contains a repr of its unparsed self in parsed, which is a Tag object.

A quick and dirty test adding this to Parsable made pickling succeed for me:

    def __getstate__(self):
        delattr(self, "parsed")

        return self.__dict__

Care to try this on your fork? I don't know that this is the direction we actually want to go in, but it will confirm that the parsed field on the base entity is the issue.

alexdlaird commented 9 months ago

Alright, the above modifies the actual object when you pickle, so here's what I went with:

def __getstate__(self):
        state = self.__dict__.copy()
        state.pop("parsed")
        return state

This is now deployed in v1.0.13, as I'd consider this a bug and wanted to get it patched.

I don't know that caching support is something we'll add natively, so I'll close this ticket. If the issue persists with the latest version, feel free to comment with more details and we can reopen if needed. Thanks!