Basically dill doesn't serialise namespaces by reference, so our approach of
running exec(user_code, namespace) breaks when there are functions defined in
the user code that reference the global namespace. On the other hand, it does
work when serialising modules, so this fix serialises the __main__ module with
dill.dump_session() and dill.load_session(), and runs the user code directly in
the __main__ module with exec(user_code).
Other possibilities for a fix:
Dump each kubeflow component's code into its own module, and serialise that
module to avoid polluting __main__. Then we need to somehow inject the
previous step's context into the module before running the code. I gave this
a try but couldn't get it to work.
See #81.
Fixes a really thorny issue with context serialisation in the kubeflow backend that was affecting multistep pipelines. Long discussion about the problem on Slack, see https://sameproject.slack.com/archives/C01LNQJ3XHU/p1648654502210689.
Basically
dill
doesn't serialise namespaces by reference, so our approach of runningexec(user_code, namespace)
breaks when there are functions defined in the user code that reference the globalnamespace
. On the other hand, it does work when serialising modules, so this fix serialises the__main__
module withdill.dump_session()
anddill.load_session()
, and runs the user code directly in the__main__
module withexec(user_code)
.Other possibilities for a fix:
__main__
. Then we need to somehow inject the previous step's context into the module before running the code. I gave this a try but couldn't get it to work.