VIVO-719: Create a way to Export and Import the entire RDF store #710

Jim Blake (Migrated from VIVO-719) said:

Chris Barnes is trying to produce a large sample data set, starting by exporting the UF data. When he tries to export the RDF, it aborts.

Christopher Barnes said:

I got a notification of activity. Does the note mean that this fix will come in v 1.7? Thx - Chris

Christopher Barnes said:

Attached a Screen Shot of the settings in EXPORT RDF that fails to complete when trying to export.

Jim Blake said:

It means that it's going into the wish list for 1.7. From there, only time will tell...

Jim Blake said:

Able to reproduce this error on my laptop with Weill Cornell data. ==> catalina.out <== Exception in thread "ajp-bio-4009-AsyncTimeout" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.ConcurrentLinkedQueue.iterator( at$ at Exception in thread "http-bio-4080-exec-10" java.lang.OutOfMemoryError: GC overhead limit exceeded at com.mysql.jdbc.Buffer.( at com.mysql.jdbc.MysqlIO.nextRow( at com.mysql.jdbc.MysqlIO.readSingleRowSet( at com.mysql.jdbc.MysqlIO.getResultSet( at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate( at com.mysql.jdbc.MysqlIO.readAllResults( at com.mysql.jdbc.MysqlIO.sqlQueryDirect( at com.mysql.jdbc.ConnectionImpl.execSQL( at com.mysql.jdbc.ConnectionImpl.execSQL( at com.mysql.jdbc.StatementImpl.executeQuery( at com.mchange.v2.c3p0.impl.NewProxyStatement.executeQuery( at com.hp.hpl.jena.sdb.sql.SDBConnection.execQuery( at com.hp.hpl.jena.sdb.compiler.SDB_QC.exec( at com.hp.hpl.jena.sdb.compiler.OpSQL.exec( at com.hp.hpl.jena.sdb.engine.QueryEngineSDB.eval( at com.hp.hpl.jena.sparql.engine.QueryEngineBase.evaluate( at com.hp.hpl.jena.sparql.engine.QueryEngineBase.createPlan( at com.hp.hpl.jena.sparql.engine.QueryEngineBase.getPlan( at com.hp.hpl.jena.sdb.engine.QueryEngineSDB$QueryEngineFactorySDB.create( at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.getPlan( at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.startQueryIterator( at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct( at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct( at edu.cornell.mannlib.vitro.webapp.rdfservice.impl.jena.RDFServiceJena.getRDFResultStream( at edu.cornell.mannlib.vitro.webapp.rdfservice.impl.jena.RDFServiceJena.sparqlConstructQuery( at edu.cornell.mannlib.vitro.webapp.rdfservice.impl.logging.LoggingRDFService.sparqlConstructQuery( at edu.cornell.mannlib.vitro.webapp.controller.api.sparqlquery.SparqlQueryApiConstructExecutor.getRawResultStream( at edu.cornell.mannlib.vitro.webapp.controller.api.sparqlquery.SparqlQueryApiRdfProducer.executeAndFormat( at edu.cornell.mannlib.vitro.webapp.controller.api.sparqlquery.SparqlQueryApiConstructExecutor.executeAndFormat( at edu.cornell.mannlib.vitro.webapp.controller.admin.SparqlQueryController.respondToQuery( at edu.cornell.mannlib.vitro.webapp.controller.admin.SparqlQueryController.doGet( at javax.servlet.http.HttpServlet.service(

Jim Blake said:

From the preceding stack trace:

It's dying here:, and with good reason. It's trying to read the entire ABOX into memory.

Jim Blake said:

The plan: to avoid holding the entire data model in memory, use a SELECT query instead of a CONSTRUCT query, and re-format the results in a stream as they arrive. Only offer exports as N-triples, since other formats handle multiple triples at once, and that defeats the stream approach.

Offer the choice of named graphs, with clarifying labels for those graphs that we recognize: "Inferred ABOX", "Declared TBOX", etc. Can we recognize the ontologies by their graph names?

Jim Blake said:

Questions from existing code: Why is the extension .owl used for RDF/XML from the TBox, but .rdf is used for RDF/XML otherwise?

Jim Blake said:

Jim Blake said:

Able to export the entire data store for Weill Cornell data. 10.5 million triples. Don't know whether we can import that data: ran for 48 hours, then manually stopped.