TheButlah / makrl

makrl - modular algorithm kit for reinforcement learning
4 stars 1 forks source link

Cloudpickle broken on Windows #28

Closed TheButlah closed 5 years ago

TheButlah commented 5 years ago

Cloudpickle is currently broken on Windows when trying to pickle global variables, so we must either find an old version that isn't or switch to dill as our pickling method of choice. See cloudpipe/cloudpickle#230

TheButlah commented 5 years ago

The cause of the issue is due to how cloudpickle prioritizes global variables in multithreaded applications, and the differences between fork() and windows threading. It is explained in https://github.com/cloudpipe/cloudpickle/issues/230#issuecomment-456890911:

Here is what is happening: We changed a couple month ago the handling of global variables in child processes.

In short, if a global variable exists in your child process, and a new value for this variable is shipped within the pickle string of a function (which cloudpickle will do for fns and args for example), the already existing variable will have the priority, and override the one shipped by the function.

In your case, the global variable args already exists in the child process, because on windows, multiprocessing.Process executes your file until if __name__ === "__main__" and therefores sets args to None. When your function is depickled, cloudpickle will choose to stick to the current value of args instead of changing it with the value the pickle string of f contains.

This choice makes sense from a multiprocessing-library-developer point of view, but it seems to understandably surprise users. We are currently implementing a switch so that user can controls this behavior. See #216. Meanwhile, the solution you mention sounds like a good fix.

PS: the discussion does not hold for linux, because the default start method for new processes is fork, which does not require python code execution, but rather cloning the current state of the parent process.

cloudpipe/cloudpickle#230 indicates that this is best fixed by a future upstream pr: https://github.com/cloudpipe/cloudpickle/pull/216. This will remain a bug until that is added, until then I will adjust the experiment code to avoid the use of any globals.

TheButlah commented 5 years ago

This seems to be fixed as of cloudpipe/cloudpickle#240