chenejac / VIVOTestMigrationJIRAClosed

0 stars 0 forks source link

VIVO-719: Create a way to Export and Import the entire RDF store #711

Closed chenejac closed 6 years ago

chenejac commented 10 years ago

Jim Blake (Migrated from VIVO-719) said:

Chris Barnes is trying to produce a large sample data set, starting by exporting the UF data. When he tries to export the RDF, it aborts.

chenejac commented 10 years ago

Christopher Barnes said:

I got a notification of activity. Does the note mean that this fix will come in v 1.7? Thx - Chris

chenejac commented 10 years ago

Christopher Barnes said:

Attached a Screen Shot of the settings in EXPORT RDF that fails to complete when trying to export.

chenejac commented 10 years ago

Jim Blake said:

It means that it's going into the wish list for 1.7. From there, only time will tell...

chenejac commented 10 years ago

Jim Blake said:

Able to reproduce this error on my laptop with Weill Cornell data. ==> catalina.out <== Exception in thread "ajp-bio-4009-AsyncTimeout" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.ConcurrentLinkedQueue.iterator(ConcurrentLinkedQueue.java:667) at org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:156) at java.lang.Thread.run(Thread.java:744) Exception in thread "http-bio-4080-exec-10" java.lang.OutOfMemoryError: GC overhead limit exceeded at com.mysql.jdbc.Buffer.(Buffer.java:59) at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1469) at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:2924) at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:477) at com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:2619) at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:1788) at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2209) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2619) at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2569) at com.mysql.jdbc.StatementImpl.executeQuery(StatementImpl.java:1510) at com.mchange.v2.c3p0.impl.NewProxyStatement.executeQuery(NewProxyStatement.java:397) at com.hp.hpl.jena.sdb.sql.SDBConnection.execQuery(SDBConnection.java:119) at com.hp.hpl.jena.sdb.compiler.SDB_QC.exec(SDB_QC.java:60) at com.hp.hpl.jena.sdb.compiler.OpSQL.exec(OpSQL.java:53) at com.hp.hpl.jena.sdb.engine.QueryEngineSDB.eval(QueryEngineSDB.java:129) at com.hp.hpl.jena.sparql.engine.QueryEngineBase.evaluate(QueryEngineBase.java:138) at com.hp.hpl.jena.sparql.engine.QueryEngineBase.createPlan(QueryEngineBase.java:109) at com.hp.hpl.jena.sparql.engine.QueryEngineBase.getPlan(QueryEngineBase.java:97) at com.hp.hpl.jena.sdb.engine.QueryEngineSDB$QueryEngineFactorySDB.create(QueryEngineSDB.java:154) at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.getPlan(QueryExecutionBase.java:266) at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.startQueryIterator(QueryExecutionBase.java:243) at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:110) at com.hp.hpl.jena.sparql.engine.QueryExecutionBase.execConstruct(QueryExecutionBase.java:100) at edu.cornell.mannlib.vitro.webapp.rdfservice.impl.jena.RDFServiceJena.getRDFResultStream(RDFServiceJena.java:310) at edu.cornell.mannlib.vitro.webapp.rdfservice.impl.jena.RDFServiceJena.sparqlConstructQuery(RDFServiceJena.java:329) at edu.cornell.mannlib.vitro.webapp.rdfservice.impl.logging.LoggingRDFService.sparqlConstructQuery(LoggingRDFService.java:42) at edu.cornell.mannlib.vitro.webapp.controller.api.sparqlquery.SparqlQueryApiConstructExecutor.getRawResultStream(SparqlQueryApiConstructExecutor.java:28) at edu.cornell.mannlib.vitro.webapp.controller.api.sparqlquery.SparqlQueryApiRdfProducer.executeAndFormat(SparqlQueryApiRdfProducer.java:54) at edu.cornell.mannlib.vitro.webapp.controller.api.sparqlquery.SparqlQueryApiConstructExecutor.executeAndFormat(SparqlQueryApiConstructExecutor.java:16) at edu.cornell.mannlib.vitro.webapp.controller.admin.SparqlQueryController.respondToQuery(SparqlQueryController.java:115) at edu.cornell.mannlib.vitro.webapp.controller.admin.SparqlQueryController.doGet(SparqlQueryController.java:98) at javax.servlet.http.HttpServlet.service(HttpServlet.java:620)

chenejac commented 10 years ago

Jim Blake said:

From the preceding stack trace:

It's dying here: RDFServiceJena.java:310, and with good reason. It's trying to read the entire ABOX into memory.

chenejac commented 10 years ago

Jim Blake said:

The plan: to avoid holding the entire data model in memory, use a SELECT query instead of a CONSTRUCT query, and re-format the results in a stream as they arrive. Only offer exports as N-triples, since other formats handle multiple triples at once, and that defeats the stream approach.

Offer the choice of named graphs, with clarifying labels for those graphs that we recognize: "Inferred ABOX", "Declared TBOX", etc. Can we recognize the ontologies by their graph names?

http://stackoverflow.com/questions/23380911/using-sesame-to-process-a-sparql-xml-stream-output-from-vitruoso

chenejac commented 10 years ago

Jim Blake said:

Questions from existing code: Why is the extension .owl used for RDF/XML from the TBox, but .rdf is used for RDF/XML otherwise?

chenejac commented 10 years ago

Jim Blake said:

http://jena.apache.org/documentation/sdb/configuration.html

chenejac commented 10 years ago

Jim Blake said:

Able to export the entire data store for Weill Cornell data. 10.5 million triples. Don't know whether we can import that data: ran for 48 hours, then manually stopped.