Closed cuddihyge closed 3 years ago
After a lot of work, it seems we're probably just as well sticking with Fuseki for now.
Results are checked in to documentationFiles/performance-to-2M-fuseki-blaze-virt.xlsx
After taking a couple week off of this one, I'm re-opening it.
I found a possible flaw in my test. New results are coming out differently. There are many variables to consider, so I need to re-sort this out.
My early guess is that PerformanceTest.java was allowing some internal SemTK data structures (ImportSpecHandler) to be re-used. This was allowing it to skip a COUNT query that is very expensive in Fuseki.
It will take 1-2 days to re-run all tests to 2,000,000 triples.
If the COUNT query is the problem,
I've re-run enough results to trust my original results.
Moving this back to in progress AGAIN. I think the bug from Greg has exposed the type of query Fuseki is really bad at (graph traversal queries like subclassOf*) and this type of query had slipped through my performance test. I'll re-run performance tests, that jugulars suggest will prove BlazeGraph is much better at real-life data ingestion where it needs to do lookup on something like REQUIREMENT which might be subclassed.
Blaze graph is a higher performance opensource graph. It is GPL so we don't want to distribute it, but we could provide instructions for users.
For this task: (1) download and run blazegraph (2) wire up enough semtk code to ingest and run queries (3) kick the tires with basic performance tests
Decide whether it is good to complete: a) upload owl functions through semtk b) construct query results c) other odds-and ends