SAME-Project / same-project

https://sameproject.ml/
Apache License 2.0
19 stars 8 forks source link

Fix context serialisation with dump_session and test multistep notebooks on kubeflow #95

Closed Bubblyworld closed 2 years ago

Bubblyworld commented 2 years ago

See #81.

Fixes a really thorny issue with context serialisation in the kubeflow backend that was affecting multistep pipelines. Long discussion about the problem on Slack, see https://sameproject.slack.com/archives/C01LNQJ3XHU/p1648654502210689.

Basically dill doesn't serialise namespaces by reference, so our approach of running exec(user_code, namespace) breaks when there are functions defined in the user code that reference the global namespace. On the other hand, it does work when serialising modules, so this fix serialises the __main__ module with dill.dump_session() and dill.load_session(), and runs the user code directly in the __main__ module with exec(user_code).

Other possibilities for a fix: