Open JNKHunter opened 4 years ago
Hi, did you use the latest 0.7.1 template? Or maybe can you just paste your POM file here? The idea of this Maven template was just to show how one can add the SANSA artifacts - basically indeed this is more just a minor guide for non experienced Maven user. But maybe you or we forgot something.
Also, can you describe how you created the Maven artifact? I guess mvn package
which triggers the Maven Shade plugin?
Hi Lorenz, thanks. I figured this was just a test dir for beginners.
I'm using the exact POM file from the develop branch, which is using the 0.7.2 version https://github.com/SANSA-Stack/SANSA-Template-Maven-Spark/blob/48adae0cb02407fc727d704b928417ed0003c940/pom.xml
And you're correct, I'm using mvn package
to create the jar.
Do you recommend switching to the 0.7.1 version?
Well, the latest version should work ... so, no need to go back I think.
Let me check what's going wrong here. I've seen this issue before but I thought it has been resolved already - at least it shouldn't happen with the ResourcETransformer in the Maven Shade plugin enabled - which is the case.
By the way, I'll also reply to your mailing list question once I found a good answer.
I'm also having the same issue on Spark 2.2.1, Scala 2.11.8, JDK 1.8
Hi.
Do you really want to use such an old Spark version? Also, SANSA-Stack has been migrated into a single repository in the meantime: https://github.com/SANSA-Stack/SANSA-Stack There should be documentation on how to add it to your POM file, i.e. which Maven artifacts as well as the repositories.
I have just switched to Spark 2.4.8. Also tried the example in https://github.com/SANSA-Stack/SANSA-Stack, but the problem still persists. I now downgraded sansa to sansa-rdf-spark-core
v0.3.0, it works. But I can only read NT files.
wait a second. what exactly do you want to do (loading which files) and what exactly are you doing to use SANSA? I mean, the Maven template is nothing more than a stub of the dependencies, you won't even need all of them if for example you just want to load the RDF data. And which file format do you want to load? The most efficient way is for sure N-Triples as this format is splittable.
We want to use SANSA for loading RDF into Spark, like you have speculated. I am aware that we only need sansa-rdf-spark
for that task. Ah, so N-Triples is more suitable? We wanted to use TTL solely because the file size is smaller.
Just to update. I have tried many things. I couldn't fix it, but I found an obvious workaround that I didn't think of before; the -jars
option when executing spark-submit
. Basically just go ahead and download the necessary jars from http://archive.apache.org/dist/jena/binaries/. Then load all the jars when submitting the application. So now I can use SANSA 0.7.2 with Scala 2.12.10 and Spark 3.1.2.
Hello,
When running the example on a Spark cluster using 'spark-submit', the following error is encountered. Any ideas what might be causing this?