fadmaa / grefine-rdf-extension

An extension to Google Refine that enables graphical mapping of Google Refine project data to an RDF skeleton and then exporting it in RDF format
http://refine.deri.ie
Other
94 stars 55 forks source link

RDF export should stream #101

Open armisael opened 9 years ago

armisael commented 9 years ago

I'm generating a large RDF using OpenRefine and the RDF-extension, and I'm getting an OutOfMemoryError. Looking at the full stacktrace (below) it seems to me that RdfExporter.buildModel is loading the whole graph in-memory; I'm not familiar with openRDF, so I'm asking: is it possible to change the exported to work in a stream-fashion? We don't really need to process the data twice, one to build the model and one to generate the triples, do we?

java.lang.OutOfMemoryError: Java heap space
    at org.openrdf.sail.memory.model.MemStatementList.growArray(MemStatementList.java:143)
    at org.openrdf.sail.memory.model.MemStatementList.add(MemStatementList.java:67)
    at org.openrdf.sail.memory.MemoryStore.addStatement(MemoryStore.java:595)
    at org.openrdf.sail.memory.MemoryStoreConnection.addStatementInternal(MemoryStoreConnection.java:418)
    at org.openrdf.sail.memory.MemoryStoreConnection.addStatementInternal(MemoryStoreConnection.java:379)
    at org.openrdf.sail.helpers.SailConnectionBase.addStatement(SailConnectionBase.java:331)
    at org.openrdf.repository.sail.SailRepositoryConnection.addWithoutCommit(SailRepositoryConnection.java:236)
    at org.openrdf.repository.base.RepositoryConnectionBase.addWithoutCommit(RepositoryConnectionBase.java:591)
    at org.openrdf.repository.base.RepositoryConnectionBase.add(RepositoryConnectionBase.java:486)
    at org.deri.grefine.rdf.ResourceNode.addLinks(ResourceNode.java:100)
    at org.deri.grefine.rdf.ResourceNode.createNode(ResourceNode.java:119)
    at org.deri.grefine.rdf.exporters.RdfExporter$1.visit(RdfExporter.java:110)
    at com.google.refine.browsing.util.ConjunctiveFilteredRows.visitRow(ConjunctiveFilteredRows.java:76)
    at com.google.refine.browsing.util.ConjunctiveFilteredRows.accept(ConjunctiveFilteredRows.java:65)
    at org.deri.grefine.rdf.exporters.RdfExporter.buildModel(RdfExporter.java:123)
    at org.deri.grefine.rdf.exporters.RdfExporter.buildModel(RdfExporter.java:115)
    at org.deri.grefine.rdf.exporters.RdfExporter.export(RdfExporter.java:85)
    at com.google.refine.commands.project.ExportRowsCommand.doPost(ExportRowsCommand.java:101)
    at com.google.refine.RefineServlet.service(RefineServlet.java:177)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1166)
    at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:81)
    at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:155)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1157)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:388)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:418)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)