Open MRCO-DURON opened 4 months ago
Did this used to work before? Do you know which version it worked on before?
Btw. Trig isn't a particularly good format for exporting a lot of data since the trig writer needs to know a lot about your data to format it correctly.
Have you tried with NQUADS? That should hopefully be a fully streaming data format.
Sadly the last working version is something I din't know about. However I have 3 other lower environments, running same version without issues.
The only difference is the size of the repositories.
Your saying I can export/convert my current repository(.ttl) as NQUADS?
Here is also an error I get when using eclipse-rdf4j-console console:
`root@ip-172-31-38-149:~# bash /home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh 15:53:33.811 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - os.name = linux 15:53:33.814 [main] DEBUG org.eclipse.rdf4j.common.platform.PlatformFactory - Detected Posix platform Connected to default data directory RDF4J Console 4.3.2 Working dir: /home/ubuntu/eclipse-rdf4j-2.5.1/bin Type 'help' for help.
connect http://127.0.0.1:8080/rdf4j-server Disconnecting from default data directory Connected to http://127.0.0.1:8080/rdf4j-server open reponame Opened repository 'reponame' muchamiel> export /mnt/test.trig Exception in thread "main" org.eclipse.rdf4j.repository.RepositoryException: <!doctype html>
HTTP Status 500 – Internal Server Error HTTP Status 500 – Internal Server Error
Type Exception Report
Message Handler processing failed; nested exception is java.lang.OutOfMemoryError
Description The server encountered an unexpected condition that prevented it from fulfilling the request.
Exception
org.springframework.web.util.NestedServletException: Handler processing failed; nested exception is java.lang.OutOfMemoryError org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1094) org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964) org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) javax.servlet.http.HttpServlet.service(HttpServlet.java:635) org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) javax.servlet.http.HttpServlet.service(HttpServlet.java:742) org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)Root Cause
java.lang.OutOfMemoryError java.base/java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:125) java.base/java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:119) java.base/java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:95) java.base/java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:156) java.base/java.io.BufferedOutputStream.write(BufferedOutputStream.java:123) java.base/java.io.DataOutputStream.write(DataOutputStream.java:107) java.base/java.io.FilterOutputStream.write(FilterOutputStream.java:108) org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeString(BinaryRDFWriter.java:346) org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeLiteral(BinaryRDFWriter.java:322) org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.writeValue(BinaryRDFWriter.java:293) org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.assignId(BinaryRDFWriter.java:254) org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.incValueFreq(BinaryRDFWriter.java:238) org.eclipse.rdf4j.rio.binary.BinaryRDFWriter.consumeStatement(BinaryRDFWriter.java:198) org.eclipse.rdf4j.rio.helpers.AbstractRDFWriter.handleStatement(AbstractRDFWriter.java:109) org.eclipse.rdf4j.repository.sail.SailRepositoryConnection.exportStatements(SailRepositoryConnection.java:382) org.eclipse.rdf4j.http.server.repository.statements.ExportStatementsView.render(ExportStatementsView.java:95) org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1405) org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1149) org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1088) org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:964) org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) javax.servlet.http.HttpServlet.service(HttpServlet.java:635) org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) javax.servlet.http.HttpServlet.service(HttpServlet.java:742) org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) com.github.ziplet.filter.compression.CompressingFilter.doFilter(CompressingFilter.java:263)Note The full stack trace of the root cause is available in the server logs.
Apache Tomcat/8.5.39 (Ubuntu)
at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.execute(SPARQLProtocolSession.java:1095) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.executeOK(SPARQLProtocolSession.java:1029) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.sendGraphQueryViaHttp(SPARQLProtocolSession.java:945) at org.eclipse.rdf4j.http.client.SPARQLProtocolSession.getRDF(SPARQLProtocolSession.java:876) at org.eclipse.rdf4j.http.client.RDF4JProtocolSession.getStatements(RDF4JProtocolSession.java:618) at org.eclipse.rdf4j.repository.http.HTTPRepositoryConnection.exportStatements(HTTPRepositoryConnection.java:274) at org.eclipse.rdf4j.repository.base.AbstractRepositoryConnection.export(AbstractRepositoryConnection.java:189) at org.eclipse.rdf4j.console.command.Export.export(Export.java:140) at org.eclipse.rdf4j.console.command.Export.execute(Export.java:94) at org.eclipse.rdf4j.console.Console.executeCommand(Console.java:379) at org.eclipse.rdf4j.console.Console.start(Console.java:336)`
Glad to know it's not a regression at least.
Any chance you can confirm that this is still an issue on RDF4J 5.0.1?
Other than that it looks like there is something that should be streaming the output but is actually writing it to a byte array output stream instead.
I updated the files for it and it happens with 5.0.1 too. Same bahavior.
Here is my config.ttl:
`cat /var/lib/tomcat8/.RDF4J/server/repositories/myRepoName/config.ttl @prefix ns: http://www.openrdf.org/config/sail/native# . @prefix rdfs: http://www.w3.org/2000/01/rdf-schema# . @prefix rep: http://www.openrdf.org/config/repository# . @prefix sail: http://www.openrdf.org/config/sail# . @prefix sb: http://www.openrdf.org/config/sail/base# . @prefix sr: http://www.openrdf.org/config/repository/sail# . @prefix xsd: http://www.w3.org/2001/XMLSchema# .
<#MyRepoName> a rep:Repository; rep:repositoryID "myRepoName"; rep:repositoryImpl [ rep:repositoryType "openrdf:SailRepository"; sr:sailImpl [ sail:sailType "openrdf:NativeStore"; sb:evaluationStrategyFactory "org.eclipse.rdf4j.query.algebra.evaluation.impl.StrictEvaluationStrategyFactory"; ns:tripleIndexes "spoc,posc" ] ]; rdfs:label "Native store" .`
Thanks for checking. And just to be sure. Is this also the case when using NQUADS?
I tried exporting the current repo as .nq. But that did not work. Is there any process for this? trig to nq?
Current Behavior
When I do:
curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig
I get only 15k files, while my repo is 145G. And my previous exports used to be 3G or more.
Have also tried using console.sh: /home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh and I get java.lang.OutOfMemoryError exceptions.
Expected Behavior
Get valid file exports by either using console.sh or rdf4j API.
Steps To Reproduce
curl "http://localhost:8080/rdf4j-workbench/repositories/repositoryName/export?Accept=application%2Ftrig" --compressed -o ./repositoryName.trig
/home/ubuntu/eclipse-rdf4j-4.3.2/bin/console.sh
Version
4.3.2
Are you interested in contributing a solution yourself?
None
Anything else?
No response