dask / dask-yarn

Deploy dask on YARN clusters
http://yarn.dask.org
BSD 3-Clause "New" or "Revised" License
69 stars 41 forks source link

issue while running on azure HDinsight cluster #131

Open brijesh-6899 opened 3 years ago

brijesh-6899 commented 3 years ago

from dask_yarn import YarnCluster; from dask.distributed import Client; cluster = YarnCluster(environment='wasb:///user/sshuser/dask_dedup.tar.gz',worker_vcores=2,worker_memory="8GiB",n_workers=2)

20/11/13 07:36:42 INFO skein.ApplicationMaster: RESTARTING: adding new container to replace dask.worker_1. 20/11/13 07:36:42 INFO skein.ApplicationMaster: REQUESTED: dask.worker_2 20/11/13 07:36:42 WARN skein.ApplicationMaster: FAILED: dask.worker_0 - [2020-11-13 07:36:37.992]wasb://dslhdisparkdehdistorage1.blob.core.windows.net/user/sshuser/.skein/application_1599254613788_0080/dask.worker.sh: No such file or directory. java.io.FileNotFoundException: wasb://dslhdisparkdehdistorage1.blob.core.windows.net/user/sshuser/.skein/application_1599254613788_0080/dask.worker.sh: No such file or directory. at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatusInternal(NativeAzureFileSystem.java:2715) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2619) at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

20/11/13 07:36:42 INFO skein.ApplicationMaster: RESTARTING: adding new container to replace dask.worker_0. 20/11/13 07:36:42 INFO skein.ApplicationMaster: REQUESTED: dask.worker_3 20/11/13 07:36:42 WARN skein.ApplicationMaster: FAILED: dask.scheduler_0 - [2020-11-13 07:36:40.532]wasb://dslhdisparkdehdistorage1.blob.core.windows.net/user/sshuser/.skein/application_1599254613788_0080/dask.scheduler.sh: No such file or directory. java.io.FileNotFoundException: wasb://dslhdisparkdehdistorage1.blob.core.windows.net/user/sshuser/.skein/application_1599254613788_0080/dask.scheduler.sh: No such file or directory. at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatusInternal(NativeAzureFileSystem.java:2715) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2619) at org.apache.hadoop.yarn.util.FSDownload.verifyAndCopy(FSDownload.java:269) at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:67) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:414) at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:411) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:411) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.doDownloadCall(ContainerLocalizer.java:242) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:235) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer$FSDownloadWrapper.call(ContainerLocalizer.java:223) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

20/11/13 07:36:42 INFO skein.ApplicationMaster: Shutting down: Failure in service dask.scheduler, see logs for more information. 20/11/13 07:36:42 INFO skein.ApplicationMaster: Unregistering application with status FAILED 20/11/13 07:36:42 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. 20/11/13 07:36:43 WARN azure.AzureFileSystemThreadPoolExecutor: Disabling threads for Delete operation as thread count 0 is <= 1 20/11/13 07:36:43 INFO azure.AzureFileSystemThreadPoolExecutor: Time taken for Delete operation is: 58 ms with threads: 0 20/11/13 07:36:43 INFO skein.ApplicationMaster: Deleted application directory wasb://tech-dsl-hdi-spark-dev-2020-06-15t09-47-36-041z@dslhdisparkdehdistorage1.blob.core.windows.net/user/sshuser/.skein/application_1599254613788_0080 20/11/13 07:36:43 INFO skein.ApplicationMaster: WebUI server shut down 20/11/13 07:36:43 INFO skein.ApplicationMaster: gRPC server shut down 20/11/13 07:36:43 INFO impl.MetricsSystemImpl: Stopping azure-file-system metrics system... 20/11/13 07:36:43 INFO impl.MetricsSinkAdapter: azurefs2 thread interrupted. 20/11/13 07:36:43 INFO impl.MetricsSystemImpl: azure-file-system metrics system stopped. 20/11/13 07:36:43 INFO impl.MetricsSystemImpl: azure-file-system metrics system shutdown complete.

quasiben commented 3 years ago

Can the sshuser create a directory and write files to HDFS ?

07:36:41.666]wasb://dslhdisparkdehdistorage1.blob.core.windows.net/user/sshuser/.skein/application_1599254613788_0080/.skein.pem: No such file or directory.

brijesh-6899 commented 3 years ago

Hi @quasiben, thanks for your response. I checked and sshuser is able to create a new directory on HDFS.