dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

Yarn Cluster #83

Closed hamroune closed 5 years ago

hamroune commented 5 years ago

Hi everyone, I'm trying to execute a small piece of code on Yarn, there is my whole code:

`


with YarnCluster(environment='environment.tar.gz',
                      worker_vcores=2,
                      worker_memory="8GiB") as cluster:

                cluster.scale(2)
                # Connect to the cluster
                client = Client(cluster)

                CSV_INPUT = 'hdfs:///Data/input/input.csv'
                CSV_OUTPUT = 'hdfs://Data/input/output.parquet'

                df = dd.read_csv(CSV_INPUT)
                dd.to_parquet(df, CSV_OUTPUT)

`

unfortunely i have this error

`

distributed.scheduler.KilledWorker: ("('pandas_read_text-read-block-from-delayed-3f488119df76d5b2ba0e2e75ec2bc55b', 0)", <Worker 'tcp://myip:40417', memory: 0, processing: 1>)

` the error is not understandable for me, have any one experienced this issue? thanks

jcrist commented 5 years ago

That error indicates you have a worker that keeps dieing. This could be due to a few things:

Usually when confronted with an error in a dask-yarn application, the first thing to do is look at the application logs. You can get these by

$ yarn logs -applicationId <application-id>
jcrist commented 5 years ago

Closing due to inactivity. Feel free to reopen if you can provide more information.