SmartDataAnalytics / RdfProcessingToolkit

Command line interface based RDF processing toolkit to run sequences of SPARQL statements ad-hoc on RDF datasets, streams of bindings and streams of named graphs with support for processing JSON, CSV and XML using function extensions
https://smartdataanalytics.github.io/RdfProcessingToolkit/
Other
39 stars 3 forks source link

NoClassDefFoundError: org/apache/hadoop/shaded/org/apache/commons/configuration2/Configuration #43

Closed TBoonX closed 11 months ago

TBoonX commented 1 year ago
> java -Dspark.kryoserializer.buffer.max="2047" -jar ./rpt-1.9.7-rc9.jar sansa query mapping.rq > result.hs12.tiny.raw.ttl 

15:35:57 [INFO] [n.s.s.c.i.CmdSansaTarqlImpl:66] - 'spark.master' not set - defaulting to: local[*]
15:35:57 [WARN] [o.a.s.u.Utils:73] - Your hostname, coypuserver.coypu.org resolves to a loopback address: 127.0.1.1; using 159.69.72.186 instead (on interface enp7s0)
15:35:57 [WARN] [o.a.s.u.Utils:73] - Set SPARK_LOCAL_IP if you need to bind to another address
15:35:58 [INFO] [o.a.s.SparkContext:61] - Running Spark version 3.3.2
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/org/apache/commons/configuration2/Configuration
    at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:43)
    at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:41)
    at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:149)
    at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:265)
    at org.apache.spark.util.Utils$.$anonfun$getCurrentUserName$1(Utils.scala:2561)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2561)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:316)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2714)
    at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953)
    at scala.Option.getOrElse(Option.scala:189)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947)
    at net.sansa_stack.spark.cli.impl.CmdSansaQueryImpl.run(CmdSansaQueryImpl.java:50)
    at net.sansa_stack.spark.cli.cmd.CmdSansaQuery.call(CmdSansaQuery.java:45)
    at net.sansa_stack.spark.cli.cmd.CmdSansaQuery.call(CmdSansaQuery.java:13)
    at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
    at picocli.CommandLine.access$1300(CommandLine.java:145)
    at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2358)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2352)
    at picocli.CommandLine$RunLast.handle(CommandLine.java:2314)
    at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
    at picocli.CommandLine$RunLast.execute(CommandLine.java:2316)
    at picocli.CommandLine.execute(CommandLine.java:2078)
    at org.aksw.rdf_processing_toolkit.cli.cmd.CmdUtils.callCmd(CmdUtils.java:77)
    at org.aksw.rdf_processing_toolkit.cli.cmd.CmdUtils.callCmd(CmdUtils.java:40)
    at org.aksw.rdf_processing_toolkit.cli.cmd.CmdUtils.execCmd(CmdUtils.java:21)
    at org.aksw.rdf_processing_toolkit.cli.main.MainCliRdfProcessingToolkit.main(MainCliRdfProcessingToolkit.java:9)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.shaded.org.apache.commons.configuration2.Configuration
    at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)
    at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
    at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:522)
    ... 27 more
Aklakan commented 11 months ago

I can confirm this bug. This hadoop stuff is so brittle to use as a dependency :(

Aklakan commented 11 months ago

The issue is because of sansa bom actually declares several hadoop modules of version 3.3.4, but the sansa-picocli-cmds module only depends on spark, which transitively pulls in further hadoop dependencies which have no override in the bom.

Aklakan commented 11 months ago

The issue should now be fixed. The necessary changes only affected the sansa-bom file; commit https://github.com/SANSA-Stack/SANSA-Stack/commit/7921fb93b992aa4be5124b11dae76fd88c1d2bd9 (and the previous one)