Closed IanEmmons closed 2 years ago
Hi!
Thanks for reaching out! These libraries are needed to run unittests such as the Docker and Fuseki functionality. Using the RMLMapper like here: https://github.com/RMLio/rmlmapper-java/blob/master/src/test/java/be/ugent/rml/readme/ReadmeTest.java doesn't work?
I have not tried your sample code yet — I can do that in a few days, if it's important, but I have a large code base written to the Jena API and I don't want to have to refactor it to the be.ugent.rml.store.* API you show in the sample code.
The key observation here is that my code uses Jena, and that causes a conflict with the RML Mapper dependencies, which seem to pull in multiple versions of Jena.
Further, if the dependencies I suggested eliminating are used only for unit testing, then they should be marked as testing scope so that they are not pulled into my code, but used only when running your tests.
@IanEmmons
I marked the Jetty, Jena, Fuseki and Docker dependencies with the test
scope.
com.github.rdfhdt:hdt-java
cannot be changed as it is necessary to output HDT files with the RMLMapper.
Can you have a try with this pom.xml
for the RMLMapper?
<project>
<modelVersion>4.0.0</modelVersion>
<groupId>be.ugent.rml</groupId>
<artifactId>rmlmapper</artifactId>
<name>RMLMapper</name>
<version>4.12.0</version>
<description>
The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources.
</description>
<url>https://github.com/RMLio/rmlmapper-java</url>
<licenses>
<license>
<name>The MIT License</name>
<url>https://raw.githubusercontent.com/RMLio/rmlmapper-java/master/LICENSE</url>
<distribution>repo</distribution>
</license>
</licenses>
<developers>
<developer>
<id>pheyvaer</id>
<name>Pieter Heyvaert</name>
<email>pieter.heyvaert@ugent.be</email>
</developer>
<developer>
<id>bjdmeest</id>
<name>Ben De Meester</name>
<email>ben.demeester@ugent.be</email>
</developer>
<developer>
<id>andimou</id>
<name>Anastasia Dimou</name>
<email>anastasia.dimou@ugent.be</email>
</developer>
</developers>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<junit.version>4.13.2</junit.version>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
</properties>
<scm>
<connection>scm:git:ssh://git@github.com:RMLio/rmlmapper-java.git</connection>
<url>https://github.com/RMLio/rmlmapper-java</url>
</scm>
<repositories>
<repository>
<id>repo.maven.apache.org</id>
<url>https://repo.maven.apache.org/maven2/</url>
</repository>
<repository>
<id>jitpack.io</id>
<url>https://jitpack.io</url>
</repository>
</repositories>
<distributionManagement>
<snapshotRepository>
<id>ossrh</id>
<url>https://oss.sonatype.org/content/repositories/snapshots</url>
</snapshotRepository>
</distributionManagement>
<profiles>
<profile>
<id>release</id>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.1.0</version>
<configuration>
<source>8</source>
</configuration>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
<configuration>
<show>public</show>
<failOnError>false</failOnError>
<doclint>none</doclint>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-source-plugin</artifactId>
<version>2.2.1</version>
<executions>
<execution>
<id>attach-sources</id>
<goals>
<goal>jar-no-fork</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-gpg-plugin</artifactId>
<version>1.5</version>
<executions>
<execution>
<id>sign-artifacts</id>
<phase>verify</phase>
<goals>
<goal>sign</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</profile>
</profiles>
<dependencies>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.2.3</version>
</dependency>
<dependency>
<groupId>commons-lang</groupId>
<artifactId>commons-lang</artifactId>
<version>2.6</version>
</dependency>
<dependency>
<groupId>commons-cli</groupId>
<artifactId>commons-cli</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>org.eclipse.rdf4j</groupId>
<artifactId>rdf4j-runtime</artifactId>
<version>2.5.5</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.stefanbirkner</groupId>
<artifactId>system-rules</artifactId>
<version>1.19.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.jayway.jsonpath</groupId>
<artifactId>json-path</artifactId>
<version>2.6.0</version>
</dependency>
<dependency>
<groupId>javax.xml.parsers</groupId>
<artifactId>jaxp-api</artifactId>
<version>1.4.5</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>8.0.26</version>
</dependency>
<dependency>
<groupId>ch.vorburger.mariaDB4j</groupId>
<artifactId>mariaDB4j</artifactId>
<version>2.4.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>9.1-901-1.jdbc4</version>
</dependency>
<!-- jre version must be the same as the gitlab runner -->
<dependency>
<groupId>com.microsoft.sqlserver</groupId>
<artifactId>mssql-jdbc</artifactId>
<version>7.2.2.jre8</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.spotify</groupId>
<artifactId>docker-client</artifactId>
<version>8.16.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<version>2.12.1</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.12.1</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-annotations</artifactId>
<version>2.12.1</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<version>9.4.17.v20190418</version>
<scope>test</scope>
</dependency>
<!-- Keep these Jetty libraries on this version bc of compatibility w/Fuseki -->
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-security</artifactId>
<version>9.4.17.v20190418</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>apache-jena-libs</artifactId>
<type>pom</type>
<version>3.8.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.hp.hpl.jena</groupId>
<artifactId>arq</artifactId>
<version>2.8.8</version>
<scope>test</scope>
</dependency>
<!-- Keep this Fuseki library on this version bc of compatibility Jetty -->
<dependency>
<groupId>org.apache.jena</groupId>
<artifactId>jena-fuseki-embedded</artifactId>
<version>3.8.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.github.rdfhdt</groupId>
<artifactId>hdt-java</artifactId>
<version>v2.1.2</version>
</dependency>
<dependency>
<groupId>commons-validator</groupId>
<artifactId>commons-validator</artifactId>
<version>1.7</version>
</dependency>
<dependency>
<groupId>com.github.fnoio</groupId>
<artifactId>grel-functions-java</artifactId>
<version>v0.7.1</version>
</dependency>
<dependency>
<groupId>com.github.slugify</groupId>
<artifactId>slugify</artifactId>
<version>2.5</version>
</dependency>
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.14.2</version>
</dependency>
<!-- START spreadsheet dependencies -->
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>1.9.0</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>4.1.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.odftoolkit/simple-odf -->
<dependency>
<!--
This should be below Apache Jena dependencies
in the pom declaration order for the correct dependency mediation
Otherwise: java.lang.NoClassDefFoundError: org/apache/jena/shared/JenaException
and no 0.8-incubating version yet
https://issues.apache.org/jira/browse/ODFTOOLKIT-415?jql=project%20%3D%20ODFTOOLKIT%20AND%20fixVersion%20%3D%200.7-incubating
-->
<groupId>org.apache.odftoolkit</groupId>
<artifactId>simple-odf</artifactId>
<version>0.8.2-incubating</version>
<exclusions>
<exclusion>
<artifactId>tools</artifactId>
<groupId>com.sun</groupId>
</exclusion>
</exclusions>
</dependency>
<!-- END spreadsheet dependencies-->
</dependencies>
<build>
<sourceDirectory>src/main/java</sourceDirectory>
<testSourceDirectory>src/test/java</testSourceDirectory>
<finalName>${project.artifactId}-${project.version}-r${buildNumber}</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>${maven.compiler.source}</source>
<target>${maven.compiler.source}</target>
</configuration>
<version>3.8.1</version>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.2.1</version>
<configuration>
<shadedArtifactAttached>true</shadedArtifactAttached>
<shadedClassifierName>r${buildNumber}-all</shadedClassifierName>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>be.ugent.rml.cli.Main</mainClass>
</transformer>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
<addHeader>false</addHeader>
</transformer>
</transformers>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<!-- Some jars are signed but shading breaks that.
Don't include signing files.
-->
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
</configuration>
<executions>
<execution>
<id>make-assembly</id> <!-- this is used for inheritance merges -->
<phase>package</phase> <!-- bind to the packaging phase -->
<goals>
<goal>shade</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>3.0.0-M3</version>
<configuration>
<useSystemClassLoader>true</useSystemClassLoader>
<useManifestOnlyJar>false</useManifestOnlyJar>
<!--<!–<parallel>methods</parallel>–>-->
<!--<!–<threadCount>10</threadCount>–>-->
</configuration>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>buildnumber-maven-plugin</artifactId>
<version>1.4</version>
<executions>
<execution>
<phase>validate</phase>
<goals>
<goal>create</goal>
</goals>
</execution>
</executions>
<configuration>
<format>{0,number}</format>
<items>
<item>buildNumber0</item>
</items>
</configuration>
</plugin>
<plugin>
<groupId>org.sonatype.plugins</groupId>
<artifactId>nexus-staging-maven-plugin</artifactId>
<version>1.6.7</version>
<extensions>true</extensions>
<configuration>
<serverId>ossrh</serverId>
<nexusUrl>https://oss.sonatype.org/</nexusUrl>
<autoReleaseAfterClose>true</autoReleaseAfterClose>
</configuration>
</plugin>
</plugins>
</build>
</project>
I will give this a try as soon as I can. (It may take a couple days, because I have some upcoming customer meetings to prepare for.)
@IanEmmons Sure! Take your time :) I put this patch already in the next release because reducing the dependencies is always a good idea. Would be great if it solves your problem as well.
Super, thanks!
My application translates a several kinds of data files into RDF using rmlmapper. Currently, this application forks a separate JVM process to translate each file, using the command-line interface of the "fat" rmlmapper jar. However, I am mapping hundreds of thousands of files, so I would like to avoid the overhead of starting a new JVM process for each file. Thus I am trying to use the non-fat rmlmapper jar as an in-process library. However, before I change a single line of code, if I simply add rmlmapper as a dependency in my Gradle script, I get the following exception in my unit tests:
I've shown the stack trace down to the point where JUnit is calling into my code. You can see that the static constructor of my test case has called the initialization code for Jena, which immediately fails before my code does anything. In the original version of my code, I did not explicitly call Jena's initialization, but rather simply created a default Model in my test case, causing Jena to call its own initialization code. I added the explicit initialization to eliminate several lines from the stack trace, but the top seven lines are the same either way. I have not been able to find the exact cause of this error, but it seems like an issue resolving all the dependencies in a way that does not cause conflicts. When I list the dependencies using Gradle, I note the following:
(I can upload the full dependency listing if you like, but it's very very long, and in the end I suspect not that helpful.)
What I would like to see is for rmlmapper to be refactored into a plain Java library designed to be called in-process, together with one or more other jars that implement some of the "bells and whistles" that are currently bundled in. In other words, the refactored library would not contain any of the following:
Further, this library should not depend on any other software that is not necessary to executing an RML script over an input file to produce RDF.