Closed uellue closed 4 years ago
As a comment, I found it hard to follow what exactly happens where when datasets are autodetected, with frequent jumps between code that is run on the executor and code that runs on the control node. Refs #518
The issue appears in the GUI as well. The detection jumps to "RAW" for all file types. Specifying the correct parameters (type etc) works for EMPAD and BLO with the fixes of #728 applied.
Can you have a look at tblib.pickling_support
and see if either a) you have a different version installed on moellenstedt than on your local PC, or b) the module is different for win/Linux platforms.
I have the same version 1.6.0 of tblib
on both systems. They both have the tblib.pickling_support.unpickle_exception
attribute. Is there a way to get more diagnostics? I couldn't find out how.
The core of the issue seems to be that somewhere somehow a function that is run with run_function()
on the executor triggers an uncaught OSError
exception, which is then tripping up tblib
, right? Perhaps the issue is that the OSError
is somehow platform-dependent and can't be reconstructed properly?
Perhaps the issue is that the OSError is somehow platform-dependent and can't be reconstructed properly?
Hmm, possibly. I guess this could be fixed by using more explicit messaging, instead of relying on exception serialization.
As a comment, I found it hard to follow what exactly happens where when datasets are autodetected, with frequent jumps between code that is run on the executor and code that runs on the control node. Refs #518
Totally agree. I think this will become much cleaner when/if we decide to implement a different RPC mechanism - most likely that will force us to implement a much cleaner RPC layer anyways (related to #199)
OSError
is indeed platform-dependent: https://docs.python.org/3/library/exceptions.html#OSError
What about including a platform-independent wrapper exception for OSError
and making sure we catch, re-package and reraise any OSError
in functions that run on the executor? Can probably be done with a decorator?
Maybe a DataSetDetectFail
exception subclassing DataSetException
?
Instead of using a decorator, we could also have a wrapper method in the DataSet
base class, which calls the underlying implementation and converts any exception into a DataSetDetectFail
. That way we don't have to sprinkle decorators all over the place :grinning:
Actually, it is sufficient to define functions that are part of a module instead of lambda or nested functions for all platform-dependent code that should run on a remote executor. I've documented that as a tip in #734.
Setup: Jupyter notebook on Windows, connecting to remote dask cluster on Linux.
After fixing remote opening of BLO files (PR pending), the following error remains when type "auto" is specified:
The message in the dask worker shell: