LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
https://lineapy.org
Apache License 2.0
664 stars 58 forks source link

LIN-712 use cloudpickle #857

Closed lionsardesai closed 1 year ago

lionsardesai commented 1 year ago

Description

Start using cloudpickle. In general, cloudpickle supports more use cases but specifically it works around the constraint of the pickle class that requires class objects to be registered in globals(). This will let linea keep hijacking the default globals and work with a separate copy in executor.

Fixes LIN-712

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

All existing tests.

lionsardesai commented 1 year ago

No blocker but two comments.

I think your PR gets rid of pandas dependency, so we probably want to remove it from requirements.txt and setup.py (maybe in other places I'm not aware).

I am also curious about scenarios that cloudpickle doesn't work and need to fallback to vanilla pickle. I thought cloudpickle is still using pickle under the hood.

1) addressed the pandas point in the sub-comment 2) i'm not sure about the implementation of cloudpickle. but i added vanilla pickle in case someone already has pickled using our older version of lineapy that did not use cloudpickle. if it fails with cloudpickle it will atleast try using pickle library before finally failing. a secondary goal was in case there is a scenario where cloudpickle fails and pickle does not, there's no downside to adding this in because pickle comes inbuilt with python.