What happened:
I've been following the documentation here to submit my application to dask-yarn. Unfortunately, the job keeps failing when I run deploy-mode as remote. It does seem to work when deploy-mode is local though. The other thing to note is that the worker-count and worker-vcores don't even reflect what I specified in my dask-yarn submit parameters. I tried looking into the yarn application logs but they weren't particularly helpful. The logs just say
21/11/28 10:47:18 INFO skein.ApplicationMaster: Shutting down: Exception in submitted dask application, see logs for more details
...but don't point me to where to look for this exception.
What you expected to happen:
I expected the application status to run to completion but instead the status returned was FAILED.
What happened: I've been following the documentation here to submit my application to
dask-yarn
. Unfortunately, the job keeps failing when I rundeploy-mode
asremote
. It does seem to work whendeploy-mode
islocal
though. The other thing to note is that theworker-count
andworker-vcores
don't even reflect what I specified in mydask-yarn submit
parameters. I tried looking into the yarn application logs but they weren't particularly helpful. The logs just say...but don't point me to where to look for this exception.
What you expected to happen:
I expected the application status to run to completion but instead the
status
returned wasFAILED
.Minimal Complete Verifiable Example:
Anything else we need to know?: Relevant files are attached here: Archive.zip
Environment: Only 26 containers and 26 vcores despite my specifying 30 workers with 2 cores each:
Application failed