blazegraph / tinkerpop3

Blazegraph Tinkerpop3 Implementation
GNU General Public License v2.0
59 stars 10 forks source link

Slowness #3

Open pietermartin opened 8 years ago

pietermartin commented 8 years ago
    @Test
    public void testBlazeGraph() {
        StopWatch stopWatch = new StopWatch();
        stopWatch.start();
        final BlazeGraphEmbedded g = BlazeGraphFactory.open(journalFile);
        for (int i = 0; i < 10000; i++) {
            g.addVertex(T.label, "Person", "name", "xxxxxx");
        }
        g.tx().commit();
        stopWatch.stop();
        System.out.println(stopWatch.toString());
        stopWatch.reset();
        stopWatch.start();
        Assert.assertEquals(10000, g.traversal().V().hasLabel("Person").count().next().intValue());
        stopWatch.stop();
        System.out.println(stopWatch.toString());
    }

0:03:40.845 for insert 0:00:00.164 for select

The select is fine but the insert time is way to slow. The time seem to be spent in com.bigdata.relation.accesspath.BlockingBuffer$BlockingIterator._hasNext()

The same code on Sqlg (Postgres) inserts in 1.2 seconds. I did not test Neo4j but it is in general way faster than Sqlg being embedded.

Expecting sub second times for an embedded graph.

beebs-systap commented 8 years ago

Adding @mikepersonick. Tryit with the https://github.com/blazegraph/tinkerpop3#bulk-load-api.

pietermartin commented 8 years ago

Ok, much faster 0:00:02.066 for the insert in batch mode. Still slow though.

neo4j takes 0:00:00.243 in normal mode. sqlg takes 0:00:00.269 in normal mode sqlg takes in 0:00:00.130 (postgres copy command underneath).

Reckon there is locking going on as embedded mode really should write faster.

beebs-systap commented 8 years ago

OK -- makes sense. That difference part of the Tinkerpop3 design per @mikepersonick:

Incremental update does strict checking on vertex and edge id re-use and enforcement of property key cardinality. Both of these validations require a read against the database indices. Blazegraph benefits greatly from buffering and batch inserting statements into the database indices. Buffering and batch insert are defeated by interleaving reads and removes into the loading process, which is what happens with the normal validation steps in incremental update mode.

Would be interested in relatively numbers with much larger data scales for loading and also with executing of traversal work loads. We'll do some testing ourselves as well. Please pass along any testing that you have.

pietermartin commented 8 years ago

Incremental update does strict checking on vertex and edge id re-use

I do not really follow that part as, at least in the code above, the user does not specify the ids so there should be nothing to check?