What happened: Connecting to a remote EMR cluster from a Jupyter notebook (using YarnCluster for Dask Cluster creation) causes notebook cell to hang. The YarnCluster client is able to successfully submit the job to Yarn on EMR and the application is listed under the running applications tab, however on the notebook client side the cell just hangs. The application on Yarn seemingly continue to run as well and has to be manually killed (nothing in the Yarn application logs seems to be indicating an error)
What you expected to happen: After the job is submitted, the notebook cell should not hang and allow user to submit further Dask transformation code to the Dask cluster created on EMR (Yarn app)
Minimal Complete Verifiable Example:
Hangs after submitting the following code in the notebook cell, no errors are reported (and there is a little asterisk beside the cell)
please note that Dask-yarn is installed on all EMR nodes already
name: test-dask
queue: default
services:
dask.scheduler:
# Restrict scheduler to 2 GiB and 1 core
resources:
memory: 2 GiB
vcores: 1
script: |
dask-yarn services scheduler
dask.worker:
# Don't start any workers initially
instances: 0
# Workers can infinite number of times
max_restarts: -1
depends:
- dask.scheduler
# Restrict workers to 4 GiB and 2 cores each
resources:
memory: 4 GiB
vcores: 2
# Distribute this python environment to every worker node
files:
environment: /notebooks_deps_pkg.tar.gz
# The bash script to start the worker
# Here we activate the environment, then start the worker
script: |
virtualenv env
source env/bin/activate
dask-yarn services worker
Anything else we need to know?: In the logs after adding print statement to base skein core.py file (added a print(req) before the return) I see the following in the logs
22/03/04 21:08:19 INFO conf.Configuration: resource-types.xml not found
22/03/04 21:08:19 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
22/03/04 21:08:19 INFO skein.Driver: Uploading application resources to hdfs://cluster.ip:8020/user/hadoop/.skein/application_1646182918041_0074
22/03/04 21:08:43 INFO skein.Driver: Submitting application...
22/03/04 21:08:43 INFO impl.YarnClientImpl: Submitted application application_1646182918041_0074
id: "application_1646182918041_0074"
<generator object KeyValueStore._input_iter at 0x7f20908370a0>
What happened: Connecting to a remote EMR cluster from a Jupyter notebook (using YarnCluster for Dask Cluster creation) causes notebook cell to hang. The YarnCluster client is able to successfully submit the job to Yarn on EMR and the application is listed under the running applications tab, however on the notebook client side the cell just hangs. The application on Yarn seemingly continue to run as well and has to be manually killed (nothing in the Yarn application logs seems to be indicating an error)
What you expected to happen: After the job is submitted, the notebook cell should not hang and allow user to submit further Dask transformation code to the Dask cluster created on EMR (Yarn app)
Minimal Complete Verifiable Example:
Hangs after submitting the following code in the notebook cell, no errors are reported (and there is a little asterisk beside the cell)
spec.yaml
Anything else we need to know?: In the logs after adding print statement to base skein core.py file (added a
print(req)
before the return) I see the following in the logsThen it just hangs in the notebook cell
Environment: