1086-Maria-Big-Data / JobAdAnalytics

3 stars 2 forks source link

java.lang.ClassNotFoundException: cc.queries.entryLevel #77

Closed Justin-Orr closed 3 years ago

Justin-Orr commented 3 years ago

I am trying to do a dry run of the code on EMR by calling main and just creating the spark session. Nothin involving the dataset yet. However EMR is generating a ClassNotFoundException even though there are no spelling mistakes. There is also another potential error on the first 3 lines of the stack trace below:

SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/mnt/yarn/usercache/hadoop/filecache/200/__spark_libs__4083287866196558451.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 21/09/02 22:02:00 INFO SignalUtils: Registered signal handler for TERM 21/09/02 22:02:00 INFO SignalUtils: Registered signal handler for HUP 21/09/02 22:02:00 INFO SignalUtils: Registered signal handler for INT 21/09/02 22:02:01 INFO ApplicationMaster: Preparing Local resources 21/09/02 22:02:02 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1630526631334_0028_000002 21/09/02 22:02:02 INFO SecurityManager: Changing view acls to: yarn,hadoop 21/09/02 22:02:02 INFO SecurityManager: Changing modify acls to: yarn,hadoop 21/09/02 22:02:02 INFO SecurityManager: Changing view acls groups to: 21/09/02 22:02:02 INFO SecurityManager: Changing modify acls groups to: 21/09/02 22:02:02 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hadoop); groups with view permissions: Set(); users with modify permissions: Set(yarn, hadoop); groups with modify permissions: Set() 21/09/02 22:02:02 INFO ApplicationMaster: Starting the user application in a separate Thread 21/09/02 22:02:02 ERROR ApplicationMaster: Uncaught exception: java.lang.ClassNotFoundException: cc.queries.entryLevel at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:629) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:394) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:254) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:764) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:762) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 21/09/02 22:02:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.ClassNotFoundException: cc.queries.entryLevel) 21/09/02 22:02:02 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.lang.ClassNotFoundException: cc.queries.entryLevel) 21/09/02 22:02:02 INFO ApplicationMaster: Deleting staging directory hdfs://ip-172-31-66-101.ec2.internal:8020/user/hadoop/.sparkStaging/application_1630526631334_0028 21/09/02 22:02:02 INFO ShutdownHookManager: Shutdown hook called

vinceecws commented 3 years ago

Not sure which commit version you used when you ran it, but here's potentially the issue:

Screen_Shot_2021-09-03_at_2 34 19_AM
vinceecws commented 3 years ago

Resolved. Root of the error was found to be caused by an incorrectly formatted spark-submit option. As below:

spark-submit --deploy-mode cluster --class cc.queries.entryLevel --jars s3://maria-1086/ArchiveSparkJars/archivespark-deps.jar, s3://maria-1086/ArchiveSparkJars/archivespark.jar --packages org.apache.hadoop:hadoop-aws:2.10.1 s3://maria-1086/Testing/Justin_Testing/jobadanalytics_2.11-0.3.jar

Specifically, it was caused by including a whitespace between the arguments to the --jars option: --jars s3://maria-1086/ArchiveSparkJars/archivespark-deps.jar, s3://maria-1086/ArchiveSparkJars/archivespark.jar

This causes the argument parser to misinterpret the first JAR .../archivespark-deps.jar as the only dependency. It takes the second JAR .../archivespark.jar as the application JAR instead, therefore the java.lang.ClassNotFoundException, since the argument to the --class option cc.queries.entryLevel only exists in the actual application JAR .../jobadanalytics_2.11-0.3.jar, which is effectively being ignored.