Closed yamrzou closed 4 years ago
Hmmm, interesting. All of the classes not found are definitely part of your Hadoop distribution (org/apache/hadoop/conf/Configuration
for example is the standard configuration class). The difference in classes is just non-deterministic behavior in Java's class loader.
It's odd that we're seeing JNI errors. Skein doesn't invoke the JNI explicitly, but Hadoop does try to load a native library if available (libhadoop
) before falling back on a Java implementation, perhaps this has something to do with it?.
Since the driver starts fine and you're seeing errors in the application only, I suspect there are differences between your edge node environment and your worker node environment. Are the hadoop libraries in a different location on your worker nodes than they are on the edge node? As per the hadoop documentation, we set the application classpath based on the the edge node environment (https://hadoop.apache.org/docs/r3.1.2/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html) - if this classpath isn't valid on the worker node you may get class loading issues as seen above.
Since this isn't a dask-yarn specific issue, here's a smaller example script you can try to make debugging simpler:
import skein
spec = skein.ApplicationSpec.from_yaml("""
name: debug-skein
queue: root
master:
script: echo "Things worked!"
""")
client = skein.Client()
client.submit(spec)
Running this will submit a small application, which should complete successfully (but in your case should fail with the same issues as above).
Thanks a lot for your input.
I checked hadoop classpath
and hadoop envvars
, both give the same output for the edge node and the worker nodes.
I suspect it might be related to libhadoop
as you said, but it might take me some time before I can test that, I will report back once done.
Hi,
I re-tested this on a newly created Hadoop cluster and it worked without problems. The issue was very likely due to a configuration mismatch between the edge node and the worker nodes, as that was fixed in the new cluster.
Closing the issue.
Starting a Dask YarnCluster fails because of
ClassNotFoundException
.Running the following code :
Yields:
The application logs output:
When using a skein client, I get the same exception with a different class not found:
The application logs output:
Any Idea? Thank you.
Version information
3.5.3
0.7.0
0.8.0
3.1.0.3.0.0.0-1634
, distributionHDP