EcoExtreML / Emulator

Apache License 2.0
0 stars 1 forks source link

Problem with my own version of Dask Implementation #24

Closed prajzwal08 closed 9 months ago

prajzwal08 commented 9 months ago

Hi @fnattino ,

Taking your sample code of parallelising with dask, 2daskParallel/1102_1year_125degrees.ipynb (Thanks to you!), i created my own code to parallelise the computation of partial correlation among two variables, when there are three other control variables. I have the code in https://github.com/prajzwal08/Master/blob/main/DaskTry.ipynb . However, i get the error: CancelledError: ('getitem-dacbc6e4f4f3d84c14fec8062327ca20', 2, 0) , 2024-01-17 20:10:03,065 - distributed.protocol.core - CRITICAL - Failed to deserialize.

I am wondering if this error comes from my (a) dask implementation? or (b) from my (partial_correlation) function? , which uses this function (pg.partial_corr) from pengouin package. This is because when i check with client.get_versions(check=True), the pingouin package is not available in the client, though i import it in the beginning.

Looking forward for your suggestions! Thanks a lot! Cheers, Prajwal

fnattino commented 9 months ago

Hi @prajzwal08! I had a brief look at the notebook, it's a bit hard to say where the error is coming from.

I think client.get_versions only checks dask-related packages (to check other packages one needs to add the 'packages' argument), but I don't see why the pingouin package shouldn't be available. Also, the dask setting up of the cluster looks fine.

I would try to run "partial_correlation" on a 'realized' input dataset without using map_blocks (maybe you can use a smaller size if it's currently too large). This would hopefully help you better identify where the problem lies!

Hope this helps!

prajzwal08 commented 9 months ago

Hi @fnattino , The issue was that the msgpack-python installed an old version by default. Once i manually install using,

conda install -c conda-forge msgpack-python==1.0.5

This worked. More on this issue is written here : https://github.com/dask/distributed/issues/8038

Thanks, Prajwal