eclipse-rdf4j / rdf4j

Eclipse RDF4J: scalable RDF for Java
https://rdf4j.org/
BSD 3-Clause "New" or "Revised" License
363 stars 164 forks source link

Repository export operation does not provide any valid .trig #5093

Open MRCO-DURON opened 3 months ago

MRCO-DURON commented 3 months ago

Current Behavior

When I do: curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig

I get only 15k files, while my repo is 145G. And my previous exports used to be 3G or more.

Have also tried using console.sh: /home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh and I get java.lang.OutOfMemoryError exceptions.

Expected Behavior

Get valid file exports by either using console.sh or rdf4j API.

Steps To Reproduce

curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig

/home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh

Version

4.3.2

Are you interested in contributing a solution yourself?

None

Anything else?

No response

hmottestad commented 3 months ago

Did this used to work before? Do you know which version it worked on before?

hmottestad commented 3 months ago

Btw. Trig isn't a particularly good format for exporting a lot of data since the trig writer needs to know a lot about your data to format it correctly.

Have you tried with NQUADS? That should hopefully be a fully streaming data format.

MRCO-DURON commented 3 months ago

Sadly the last working version is something I din't know about. However I have 3 other lower environments, running same version without issues.

The only difference is the size of the repositories.

Your saying I can export/convert my current repository(.ttl) as NQUADS?

MRCO-DURON commented 3 months ago

Here is also an error I get when using eclipse-rdf4j-console console:

`root@ip-172-31-38-149:~# bash /home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh 15:53:33.811 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - os.name = linux 15:53:33.814 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - Detected Posix platform Connected to default data directory RDF4J Console 4.3.2 Working dir: /home/ubuntu/eclipse-rdf4j-2.5.1/bin Type 'help' for help.

connect http://127.0.0.1:8080/rdf4j-server Disconnecting from default data directory Connected to http://127.0.0.1:8080/rdf4j-server open reponame Opened repository 'reponame' muchamiel> export /mnt/test.trig Exception in thread "main" org.eclipse.rdf4j.repository.RepositoryException: <!doctype html>HTTP Status 500 – Internal Server Error

HTTP Status 500 – Internal Server Error


Type Exception Report

Message Handler processing failed; nested exception is java.lang.OutOfMemoryError

Description The server encountered an unexpected condition that prevented it from fulfilling the request.

Exception

org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1094)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)

Root Cause

java.lang.OutOfMemoryError
java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125)
java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119)
java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95)
java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156)
java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
java.base/java.io.DataOutputStream.write(DataOutputStream.java:107)
java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeString(BinaryRDFWriter.java:346)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeLiteral(BinaryRDFWriter.java:322)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeValue(BinaryRDFWriter.java:293)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.assignId(BinaryRDFWriter.java:254)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.incValueFreq(BinaryRDFWriter.java:238)
org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.consumeStatement(BinaryRDFWriter.java:198)
org.eclipse.rdf4j.rio.helpers.AbstractRDFWriter.handleStatement(AbstractRDFWriter.java:109)
org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.exportStatements(SailRepositoryConnection.java:382)
org.eclipse.rdf4j.http.server.repository.statements.ExportStatementsView.render(ExportStatementsView.java:95)
org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1405)
org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1149)
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1088)
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964)
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006)
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898)
javax.servlet.http.HttpServlet.service(HttpServlet.java:635)
org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883)
javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)

Note The full stack trace of the root cause is available in the server logs.


Apache Tomcat/8.5.39 (Ubuntu)

at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1095) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1029) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQueryViaHttp(SPARQLProtocolSession.java:945) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:876) at org.eclipse.rdf4j.http.client.RDF4JProtocolSession.getStatements(RDF4JProtocolSession.java:618) at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.exportStatements(HTTPRepositoryConnection.java:274) at org.eclipse.rdf4j.repository.base.AbstractRepositoryConnection.export(AbstractRepositoryConnection.java:189) at org.eclipse.rdf4j.console.command.Export.export(Export.java:140) at org.eclipse.rdf4j.console.command.Export.execute(Export.java:94) at org.eclipse.rdf4j.console.Console.executeCommand(Console.java:379) at org.eclipse.rdf4j.console.Console.start(Console.java:336)`

hmottestad commented 3 months ago

Glad to know it's not a regression at least.

Any chance you can confirm that this is still an issue on RDF4J 5.0.1?

Other than that it looks like there is something that should be streaming the output but is actually writing it to a byte array output stream instead.

MRCO-DURON commented 3 months ago

I updated the files for it and it happens with 5.0.1 too. Same bahavior.

Here is my config.ttl:

`cat /var/lib/tomcat8/.RDF4J/server/repositories/myRepoName/config.ttl @prefix ns: http://www.openrdf.org/config/sail/native# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix rep: http://www.openrdf.org/config/repository# . @prefix sail: http://www.openrdf.org/config/sail# . @prefix sb: http://www.openrdf.org/config/sail/base# . @prefix sr: http://www.openrdf.org/config/repository/sail# . @prefix xsd: http://www.w3.org/2001/XMLSchema# .

<#MyRepoName> a rep:Repository; rep:repositoryID "myRepoName"; rep:repositoryImpl [ rep:repositoryType "openrdf:SailRepository"; sr:sailImpl [ sail:sailType "openrdf:NativeStore"; sb:evaluationStrategyFactory "org.eclipse.rdf4j.query.algebra.evaluation.impl.StrictEvaluationStrategyFactory"; ns:tripleIndexes "spoc,posc" ] ]; rdfs:label "Native store" .`

hmottestad commented 3 months ago

Thanks for checking. And just to be sure. Is this also the case when using NQUADS?

MRCO-DURON commented 3 months ago

I tried exporting the current repo as .nq. But that did not work. Is there any process for this? trig to nq?