fugue-project / fugue

A unified interface for distributed computing. Fugue executes SQL, Python, Pandas, and Polars code on Spark, Dask and Ray without any rewrites.
https://fugue-tutorials.readthedocs.io/
Apache License 2.0
1.98k stars 94 forks source link

[QUESTION] FugueWorkflowResult & pickle deserialization #470

Open bitsofinfo opened 1 year ago

bitsofinfo commented 1 year ago

anyone else doing anything that pickles FugueWorkflowResult ? I'm executing some sql flows over secondary python processes, and running into an error like

return _ForkingPickler.loads(buf.getbuffer())
TypeError: FugueWorkflowResult.__init__() missing 1 required positional argument: 'yields'

i assume this is because FugueWorkflowResult doesn't have provide a no-arg constructor? (not sure actually, not familiar w/ pickle but this smells like a deserialization kind of issue w/ a class not being able to be rematerialized)

I'm getting around this by capturing the result and just extracting the yields and returning that instead prior to being pickled and sent back to the parent process.

goodwanghan commented 1 year ago

Fugue doesn't support running the main(driver) logic on different processes. So the behavior is undefined.

bitsofinfo commented 1 year ago

interestingly enough this error went away when I switched to a different multiprocessing framework that sits above python's. I started using pebble. In any case, this use-case is most definitely one I am pursuing and is currently working for me. i.e. a primary process forking off child processes where the actual fugue sql flows get executed.