Closed prajzwal08 closed 9 months ago
Hi @prajzwal08! I had a brief look at the notebook, it's a bit hard to say where the error is coming from.
I think client.get_versions only checks dask-related packages (to check other packages one needs to add the 'packages' argument), but I don't see why the pingouin package shouldn't be available. Also, the dask setting up of the cluster looks fine.
I would try to run "partial_correlation" on a 'realized' input dataset without using map_blocks (maybe you can use a smaller size if it's currently too large). This would hopefully help you better identify where the problem lies!
Hope this helps!
Hi @fnattino , The issue was that the msgpack-python installed an old version by default. Once i manually install using,
conda install -c conda-forge msgpack-python==1.0.5
This worked. More on this issue is written here : https://github.com/dask/distributed/issues/8038
Thanks, Prajwal
Hi @fnattino ,
Taking your sample code of parallelising with dask, 2daskParallel/1102_1year_125degrees.ipynb (Thanks to you!), i created my own code to parallelise the computation of partial correlation among two variables, when there are three other control variables. I have the code in https://github.com/prajzwal08/Master/blob/main/DaskTry.ipynb . However, i get the error: CancelledError: ('getitem-dacbc6e4f4f3d84c14fec8062327ca20', 2, 0) , 2024-01-17 20:10:03,065 - distributed.protocol.core - CRITICAL - Failed to deserialize.
I am wondering if this error comes from my (a) dask implementation? or (b) from my (partial_correlation) function? , which uses this function (pg.partial_corr) from pengouin package. This is because when i check with client.get_versions(check=True), the pingouin package is not available in the client, though i import it in the beginning.
Looking forward for your suggestions! Thanks a lot! Cheers, Prajwal