Open crusaderky opened 4 years ago
Thanks for tracking that down. This seems a bit tricky...
Ideally, there should be no limit to the length of dask keys.
For that to be an option, zict
would need to be able to deterministically hash keys into something that works for the OS right? That seems doable.
As a second best option, the client should deterministically raise an Exception as soon as the user tries uploading a very long key to the scheduler.
How would we know what "too long" is? Given a static set of workers, each worker could query the OS to find the maximum file length... But if a new worker comes along?
How would we know what "too long" is? Given a static set of workers, each worker could query the OS to find the maximum file length
I was just thinking about hardcoding the minimum common denominator among modern OSs/filesystems.
dask client: Linux x64, NFSv4 NFS server: Linux x64, btrfs
My dask cluster got completely stuck. Looking at the GUI, I can read there are 5 processing tasks, but everything is frozen at 0% CPU. In the worker logs, I can read:
mtm.compute_trade_item-mtm.compute_portfolio-PMI%2FPMI2013%2FPMI%20TRD%2FPMI%20TRD%20MEX%2FREFINADOS%2FGASOLINAS%20Y%20COMPONENTES%2FAlmacenes%20y%20ductos%202013%2FAlm%202013-10%2FCiudad%20Ju%C3%A1rez%202013-10%2FCJ%20Ducto%20Plains%202013-10-5ea19e760000000000000003
is a key of the dask graph that I hand-crafted.Because of the very nature of evict(), this issue is particularly insidious because it won't appear until I reach production-level data volumes.
Workaround
Change my application to introduce scrambling if the keys it generates exceed a maximum length.
Expected behaviour
Ideally, there should be no limit to the length of dask keys. As a second best option, the client should deterministically raise an Exception as soon as the user tries uploading a very long key to the scheduler.