RMLio / rmlmapper-java

The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources
http://rml.io
MIT License
147 stars 61 forks source link

When using rmlmapper as an in-process library, Jena fails to initialize #127

Closed IanEmmons closed 2 years ago

IanEmmons commented 3 years ago

My application translates a several kinds of data files into RDF using rmlmapper. Currently, this application forks a separate JVM process to translate each file, using the command-line interface of the "fat" rmlmapper jar. However, I am mapping hundreds of thousands of files, so I would like to avoid the overhead of starting a new JVM process for each file. Thus I am trying to use the non-fat rmlmapper jar as an in-process library. However, before I change a single line of code, if I simply add rmlmapper as a dependency in my Gradle script, I get the following exception in my unit tests:

java.lang.NoClassDefFoundError: org/apache/jena/sparql/lib/Metadata
   at org.apache.jena.tdb2.TDB2.<clinit>(TDB2.java:163)
   at org.apache.jena.tdb2.sys.InitTDB2.start(InitTDB2.java:28)
   at org.apache.jena.sys.JenaSystem.lambda$init$2(JenaSystem.java:117)
   at java.util.ArrayList.forEach(ArrayList.java:1259)
   at org.apache.jena.sys.JenaSystem.forEach(JenaSystem.java:192)
   at org.apache.jena.sys.JenaSystem.forEach(JenaSystem.java:169)
   at org.apache.jena.sys.JenaSystem.init(JenaSystem.java:115)
   at com.bbn.ix.sunrise.etl.FileSourceRmlTest.<clinit>(FileSourceRmlTest.java:12)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
   at org.junit.platform.commons.util.ReflectionUtils.newInstance(ReflectionUtils.java:513)
   at org.junit.jupiter.engine.execution.ConstructorInvocation.proceed(ConstructorInvocation.java:56)

I've shown the stack trace down to the point where JUnit is calling into my code. You can see that the static constructor of my test case has called the initialization code for Jena, which immediately fails before my code does anything. In the original version of my code, I did not explicitly call Jena's initialization, but rather simply created a default Model in my test case, causing Jena to call its own initialization code. I added the explicit initialization to eliminate several lines from the stack trace, but the top seven lines are the same either way. I have not been able to find the exact cause of this error, but it seems like an issue resolving all the dependencies in a way that does not cause conflicts. When I list the dependencies using Gradle, I note the following:

(I can upload the full dependency listing if you like, but it's very very long, and in the end I suspect not that helpful.)

What I would like to see is for rmlmapper to be refactored into a plain Java library designed to be called in-process, together with one or more other jars that implement some of the "bells and whistles" that are currently bundled in. In other words, the refactored library would not contain any of the following:

Further, this library should not depend on any other software that is not necessary to executing an RML script over an input file to produce RDF.

DylanVanAssche commented 2 years ago

Hi!

Thanks for reaching out! These libraries are needed to run unittests such as the Docker and Fuseki functionality. Using the RMLMapper like here: https://github.com/RMLio/rmlmapper-java/blob/master/src/test/java/be/ugent/rml/readme/ReadmeTest.java doesn't work?

IanEmmons commented 2 years ago

I have not tried your sample code yet — I can do that in a few days, if it's important, but I have a large code base written to the Jena API and I don't want to have to refactor it to the be.ugent.rml.store.* API you show in the sample code.

The key observation here is that my code uses Jena, and that causes a conflict with the RML Mapper dependencies, which seem to pull in multiple versions of Jena.

Further, if the dependencies I suggested eliminating are used only for unit testing, then they should be marked as testing scope so that they are not pulled into my code, but used only when running your tests.

DylanVanAssche commented 2 years ago

@IanEmmons

I marked the Jetty, Jena, Fuseki and Docker dependencies with the test scope. com.github.rdfhdt:hdt-java cannot be changed as it is necessary to output HDT files with the RMLMapper. Can you have a try with this pom.xml for the RMLMapper?

<project>
    <modelVersion>4.0.0</modelVersion>
    <groupId>be.ugent.rml</groupId>
    <artifactId>rmlmapper</artifactId>
    <name>RMLMapper</name>
    <version>4.12.0</version>
    <description>
        The RMLMapper executes RML rules to generate high quality Linked Data from multiple originally (semi-)structured data sources.
    </description>
    <url>https://github.com/RMLio/rmlmapper-java</url>
    <licenses>
        <license>
            <name>The MIT License</name>
            <url>https://raw.githubusercontent.com/RMLio/rmlmapper-java/master/LICENSE</url>
            <distribution>repo</distribution>
        </license>
    </licenses>
    <developers>
        <developer>
            <id>pheyvaer</id>
            <name>Pieter Heyvaert</name>
            <email>pieter.heyvaert@ugent.be</email>
        </developer>
        <developer>
            <id>bjdmeest</id>
            <name>Ben De Meester</name>
            <email>ben.demeester@ugent.be</email>
        </developer>
        <developer>
            <id>andimou</id>
            <name>Anastasia Dimou</name>
            <email>anastasia.dimou@ugent.be</email>
        </developer>
    </developers>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <junit.version>4.13.2</junit.version>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
    </properties>

    <scm>
        <connection>scm:git:ssh://git@github.com:RMLio/rmlmapper-java.git</connection>
        <url>https://github.com/RMLio/rmlmapper-java</url>
    </scm>

    <repositories>
        <repository>
            <id>repo.maven.apache.org</id>
            <url>https://repo.maven.apache.org/maven2/</url>
        </repository>
        <repository>
            <id>jitpack.io</id>
            <url>https://jitpack.io</url>
        </repository>
    </repositories>

    <distributionManagement>
        <snapshotRepository>
            <id>ossrh</id>
            <url>https://oss.sonatype.org/content/repositories/snapshots</url>
        </snapshotRepository>
    </distributionManagement>

    <profiles>
        <profile>
            <id>release</id>
            <build>
                <plugins>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-javadoc-plugin</artifactId>
                        <version>3.1.0</version>
                        <configuration>
                            <source>8</source>
                        </configuration>
                        <executions>
                            <execution>
                                <id>attach-javadocs</id>
                                <goals>
                                    <goal>jar</goal>
                                </goals>
                                <configuration>
                                    <show>public</show>
                                    <failOnError>false</failOnError>
                                    <doclint>none</doclint>
                                </configuration>
                            </execution>
                        </executions>
                    </plugin>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-source-plugin</artifactId>
                        <version>2.2.1</version>
                        <executions>
                            <execution>
                                <id>attach-sources</id>
                                <goals>
                                    <goal>jar-no-fork</goal>
                                </goals>
                            </execution>
                        </executions>
                    </plugin>
                    <plugin>
                        <groupId>org.apache.maven.plugins</groupId>
                        <artifactId>maven-gpg-plugin</artifactId>
                        <version>1.5</version>
                        <executions>
                            <execution>
                                <id>sign-artifacts</id>
                                <phase>verify</phase>
                                <goals>
                                    <goal>sign</goal>
                                </goals>
                            </execution>
                        </executions>
                    </plugin>
                </plugins>
            </build>
        </profile>
    </profiles>

    <dependencies>
        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.2.3</version>
        </dependency>
        <dependency>
            <groupId>commons-lang</groupId>
            <artifactId>commons-lang</artifactId>
            <version>2.6</version>
        </dependency>
        <dependency>
            <groupId>commons-cli</groupId>
            <artifactId>commons-cli</artifactId>
            <version>1.4</version>
        </dependency>
        <dependency>
            <groupId>org.eclipse.rdf4j</groupId>
            <artifactId>rdf4j-runtime</artifactId>
            <version>2.5.5</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>${junit.version}</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.github.stefanbirkner</groupId>
            <artifactId>system-rules</artifactId>
            <version>1.19.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.jayway.jsonpath</groupId>
            <artifactId>json-path</artifactId>
            <version>2.6.0</version>
        </dependency>
        <dependency>
            <groupId>javax.xml.parsers</groupId>
            <artifactId>jaxp-api</artifactId>
            <version>1.4.5</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>8.0.26</version>
        </dependency>
        <dependency>
            <groupId>ch.vorburger.mariaDB4j</groupId>
            <artifactId>mariaDB4j</artifactId>
            <version>2.4.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>postgresql</groupId>
            <artifactId>postgresql</artifactId>
            <version>9.1-901-1.jdbc4</version>
        </dependency>
        <!-- jre version must be the same as the gitlab runner -->
        <dependency>
            <groupId>com.microsoft.sqlserver</groupId>
            <artifactId>mssql-jdbc</artifactId>
            <version>7.2.2.jre8</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.spotify</groupId>
            <artifactId>docker-client</artifactId>
            <version>8.16.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.12.1</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>2.12.1</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>2.12.1</version>
        </dependency>
        <dependency>
            <groupId>org.eclipse.jetty</groupId>
            <artifactId>jetty-server</artifactId>
            <version>9.4.17.v20190418</version>
            <scope>test</scope>
        </dependency>
        <!-- Keep these Jetty libraries on this version bc of compatibility w/Fuseki -->
        <dependency>
            <groupId>org.eclipse.jetty</groupId>
            <artifactId>jetty-security</artifactId>
            <version>9.4.17.v20190418</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>org.apache.jena</groupId>
            <artifactId>apache-jena-libs</artifactId>
            <type>pom</type>
            <version>3.8.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.hp.hpl.jena</groupId>
            <artifactId>arq</artifactId>
            <version>2.8.8</version>
            <scope>test</scope>
        </dependency>
        <!-- Keep this Fuseki library on this version bc of compatibility Jetty -->
        <dependency>
            <groupId>org.apache.jena</groupId>
            <artifactId>jena-fuseki-embedded</artifactId>
            <version>3.8.0</version>
            <scope>test</scope>
        </dependency>
        <dependency>
            <groupId>com.github.rdfhdt</groupId>
            <artifactId>hdt-java</artifactId>
            <version>v2.1.2</version>
        </dependency>
        <dependency>
            <groupId>commons-validator</groupId>
            <artifactId>commons-validator</artifactId>
            <version>1.7</version>
        </dependency>
        <dependency>
            <groupId>com.github.fnoio</groupId>
            <artifactId>grel-functions-java</artifactId>
            <version>v0.7.1</version>
        </dependency>
        <dependency>
            <groupId>com.github.slugify</groupId>
            <artifactId>slugify</artifactId>
            <version>2.5</version>
        </dependency>
        <dependency>
            <groupId>org.jsoup</groupId>
            <artifactId>jsoup</artifactId>
            <version>1.14.2</version>
        </dependency>
        <!--        START spreadsheet dependencies -->
        <dependency>
            <groupId>org.apache.commons</groupId>
            <artifactId>commons-csv</artifactId>
            <version>1.9.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.poi</groupId>
            <artifactId>poi-ooxml</artifactId>
            <version>4.1.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/org.apache.odftoolkit/simple-odf -->
        <dependency>
<!--
            This should be below Apache Jena dependencies
            in the pom declaration order for the correct dependency mediation
            Otherwise: java.lang.NoClassDefFoundError: org/apache/jena/shared/JenaException
            and no 0.8-incubating version yet
            https://issues.apache.org/jira/browse/ODFTOOLKIT-415?jql=project%20%3D%20ODFTOOLKIT%20AND%20fixVersion%20%3D%200.7-incubating
-->
            <groupId>org.apache.odftoolkit</groupId>
            <artifactId>simple-odf</artifactId>
            <version>0.8.2-incubating</version>
            <exclusions>
                <exclusion>
                    <artifactId>tools</artifactId>
                    <groupId>com.sun</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <!-- END spreadsheet dependencies-->
    </dependencies>

    <build>
        <sourceDirectory>src/main/java</sourceDirectory>
        <testSourceDirectory>src/test/java</testSourceDirectory>
        <finalName>${project.artifactId}-${project.version}-r${buildNumber}</finalName>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>${maven.compiler.source}</source>
                    <target>${maven.compiler.source}</target>
                </configuration>
                <version>3.8.1</version>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>3.2.1</version>
                <configuration>
                    <shadedArtifactAttached>true</shadedArtifactAttached>
                    <shadedClassifierName>r${buildNumber}-all</shadedClassifierName>
                    <transformers>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                            <mainClass>be.ugent.rml.cli.Main</mainClass>
                        </transformer>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheLicenseResourceTransformer"/>
                        <transformer implementation="org.apache.maven.plugins.shade.resource.ApacheNoticeResourceTransformer">
                            <addHeader>false</addHeader>
                        </transformer>
                    </transformers>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <!-- Some jars are signed but shading breaks that.
                                     Don't include signing files.
                                -->
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                    </filters>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>                        <!-- this is used for inheritance merges -->
                        <phase>package</phase>                        <!-- bind to the packaging phase -->
                        <goals>
                            <goal>shade</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>3.0.0-M3</version>
                <configuration>
                    <useSystemClassLoader>true</useSystemClassLoader>
                    <useManifestOnlyJar>false</useManifestOnlyJar>
                    <!--&lt;!&ndash;<parallel>methods</parallel>&ndash;&gt;-->
                    <!--&lt;!&ndash;<threadCount>10</threadCount>&ndash;&gt;-->
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.codehaus.mojo</groupId>
                <artifactId>buildnumber-maven-plugin</artifactId>
                <version>1.4</version>
                <executions>
                    <execution>
                        <phase>validate</phase>
                        <goals>
                            <goal>create</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <format>{0,number}</format>
                    <items>
                        <item>buildNumber0</item>
                    </items>
                </configuration>
            </plugin>
            <plugin>
                <groupId>org.sonatype.plugins</groupId>
                <artifactId>nexus-staging-maven-plugin</artifactId>
                <version>1.6.7</version>
                <extensions>true</extensions>
                <configuration>
                    <serverId>ossrh</serverId>
                    <nexusUrl>https://oss.sonatype.org/</nexusUrl>
                    <autoReleaseAfterClose>true</autoReleaseAfterClose>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>
IanEmmons commented 2 years ago

I will give this a try as soon as I can. (It may take a couple days, because I have some upcoming customer meetings to prepare for.)

DylanVanAssche commented 2 years ago

@IanEmmons Sure! Take your time :) I put this patch already in the next release because reducing the dependencies is always a good idea. Would be great if it solves your problem as well.

IanEmmons commented 2 years ago

Super, thanks!