JanusGraph / janusgraph

JanusGraph: an open-source, distributed graph database
https://janusgraph.org
Other
5.32k stars 1.17k forks source link

graph creation with default id authority wait time is too slow for use in unit tests with multiple repeated graph creations #1859

Open stellarspot opened 5 years ago

stellarspot commented 5 years ago

For confirmed bugs, please report:

I use inmemory JanusGraph storage to run my tests. Creation of JanusGraph is very slow and it seems because it waits for some period of time for new id blocks.

The sample below compares creating inmemory JanusGraph with automatic and custom id generation. It shows the strange behavior that the time of JanusGraph creation does not depend on properties ids.authority.wait-time, ids.renew-timeout, ids.block-size, and ids.renew-percentage but does depend on order of JanusGraph instances creation.

See the code example below.

If I first run creating vertices with generated ids and then custom ids I got: [generated ids] vertices: 1000, elapsed time: 1255(ms), graph creation time: 593 [custom ids] vertices: 1000, elapsed time: 255(ms), graph creation time: 4

The method with generated ids takes about 1 sec and the graph creation takes about 600ms.

If I first run creating vertices with custom ids I got: [custom ids] vertices: 1000, elapsed time: 1125(ms), graph creation time: 590 [generated ids] vertices: 1000, elapsed time: 401(ms), graph creation time: 3

The method with custom ids takes about 1sec and graph creation takes about 600ms.

If I run only one of them then each takes about 1 sec and graph creation takes about 600ms.

import
org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.T;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphTransaction;
import org.janusgraph.graphdb.database.StandardJanusGraph;
import org.janusgraph.graphdb.idmanagement.IDManager;

public class JanusGraphSample {

     private static final String LABEL = "SampleLabel";
     private static final String KEY = "SampleKey";
     private static final String VALUE = "SampleValue";
     private static final int N = 1000;

     public static void main(String[] args) {
        // Try to run these methods in reverse order
         addVerticesWithGeneratedIds();
         addVerticesWithCustomIds();
     }

     private static JanusGraph getJanusGraph(boolean customIds) {

         JanusGraphFactory.Builder builder = JanusGraphFactory.build()
                 .set("storage.backend", "inmemory")
                 .set("ids.authority.wait-time", "5")
                 .set("ids.renew-timeout", "50")
                 .set("ids.block-size", "1000000000")
                 .set("ids.renew-percentage", "0.3");

         if (customIds) {
             builder = builder.set("graph.set-vertex-id", "true");
         }

         return builder.open();
     }

     static long verticesCount(JanusGraph graph) {
         try (JanusGraphTransaction tx = graph.newTransaction()) {
             GraphTraversalSource g = tx.traversal();
             return g.V().count().next();
         }
     }

     public static void addVerticesWithGeneratedIds() {

         long time = System.currentTimeMillis();
         try (JanusGraph graph = getJanusGraph(false)) {
             long creationTime = System.currentTimeMillis();

             try (JanusGraphTransaction tx = graph.newTransaction()) {
                 GraphTraversalSource g = tx.traversal();

                 for (int i = 1; i <= N; i++) {
                     String value = String.format("%s-%d", VALUE, i);
                     g.addV(LABEL).property(KEY, value).next();
                 }
                 tx.commit();
             }
             long elapsedTime = System.currentTimeMillis();
             long vertices = verticesCount(graph);
             System.out.printf("[generated ids] vertices: %d, elapsed
time: %d(ms), graph creation time: %d%n", vertices, elapsedTime - time,
creationTime - time);
         }
     }

     public static void addVerticesWithCustomIds() {

         long time = System.currentTimeMillis();

         try (JanusGraph graph = getJanusGraph(true)) {
             long creationTime = System.currentTimeMillis();

             IDManager idManager = ((StandardJanusGraph)
graph).getIDManager();

             try (JanusGraphTransaction tx = graph.newTransaction()) {
                 GraphTraversalSource g = tx.traversal();

                 for (int i = 1; i <= N; i++) {
                     long id = idManager.toVertexId(i);
                     String value = String.format("%s-%d", VALUE, i);
                     g.addV(LABEL).property(T.id, id).property(KEY,
value).next();

                 }
                 tx.commit();
             }

             long elapsedTime = System.currentTimeMillis();
             long vertices = verticesCount(graph);
             System.out.printf("[custom ids] vertices: %d, elapsed time:
%d(ms), graph creation time: %d%n", vertices, elapsedTime - time,
creationTime - time);
         }
     }
} 
FlorianHockmann commented 5 years ago

What is the bug here exactly? The fact that it takes a second to open a graph instance is not really a bug.

I'm not an expert on the in-memory backend, but I think that the behaviour you're seeing is simply because JanusGraph has to initialize the in-memory backend and create a graph for it when you create the graph object the first time. The second call to getJanusGraph in your application should then just get a connection to that existing backend and is therefore faster. Maybe someone with more knowledge about the in-memory backend can correct me here if I'm wrong (@mad @dk-github @farodin91).

stellarspot commented 5 years ago

There is one question: should the second call to JanusGraphFactory.build(). ... .open() create new clean JanusGraph storage?

And one issue: the program sets properties ids.authority.wait-time, ids.renew-timeout, ids.block-size, and ids.renew-percentage but it seems they do not have effect, the id blocks waiting time is not decreased.

dk-github commented 5 years ago

Hi Alexander,

At the first glance, none of the problems you report are specifically connected to inmemory backend as such - i.e. neither graph creation nor id authority logic is changed when you are using inmemory backend.

So the answer to your question should be - yes, each open() should create a new instance, unless you have specified the same graph name in the configuration explicitly.

The fact that the second time the opening happens quicker than the first time is probably due to some static initialisation happening first time.

In fact I remember having to make some fix having to do with id authority, to address significant slowness of our tests involving creation of a new graph on each execution. I will dig through my old commits later in the day and come back to you in this thread if I remember what it was.

Thanks, Dmitry

On Wed, 16 Oct 2019 at 16:49, Alexander Scherbatiy notifications@github.com wrote:

There is one question: should the second call to JanusGraphFactory.build(). ... .open() create new clean JanusGraph storage?

And one issue: the program sets properties ids.authority.wait-time, ids.renew-timeout, ids.block-size, and ids.renew-percentage but it seems they do not have effect, the id blocks waiting time is not decreased.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JanusGraph/janusgraph/issues/1859?email_source=notifications&email_token=ABNBTWUNK3SUD3U5MIJ3RLLQO4ZXZA5CNFSM4JBIBNRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEBM7TTY#issuecomment-542767567, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNBTWSFL32ISFXLXYVILC3QO4ZXZANCNFSM4JBIBNRA .

dk-github commented 5 years ago

@stellarspot

First of all, if you could modify your test to create and dispose of a graph for say 10 times in a row, and measure the time it takes on each iteration - these timings should be comparable for all but the first run, and thus we can remove the effects of initial class loading/JIT/statics etc from consideration.

Second, if you find that even for the second and consequent runs graph creation takes longer than you would like it to - try setting "cluster.max-partitions" to smth like 2 (instead of the default of 32) in the configuration, and see how the timings change. According to some old comments I found, this can shave some time off each initialization, and have saved a considerable amount off the test suite run time in one project using Janusgraph and a lot of tests with inmemory backend.

If you want to assert hat the graphs created each time are different, you can always add a vertex in each one, and then assert that it is not there in the next one.

Please post back here with your results or any other concerns.

stellarspot commented 5 years ago

I have updated the test to print first results (elapsed time, graph creation time, and vertices insertion time), and print average time of subsequent results:

import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.T;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphTransaction;
import org.janusgraph.graphdb.database.StandardJanusGraph;
import org.janusgraph.graphdb.idmanagement.IDManager;

public class JanusGraphSample {

    // --- Parameters for test ---
    private static final int VERTICES = 10;
    private static final int ITERATIONS = 20;
    private static final boolean USE_CUSTOM_IDS = true;
    // --- ------------------- ---

    private static final boolean DEBUG = false;

    private static final String LABEL = "SampleLabel";
    private static final String KEY = "SampleKey";
    private static final String VALUE = "SampleValue";

    public static void main(String[] args) {

        MeasuredTime[] times = new MeasuredTime[ITERATIONS];
        for (int i = 0; i < ITERATIONS; i++) {
            MeasuredTime measuredTime = testJanusGraph(USE_CUSTOM_IDS);
            times[i] = measuredTime;
        }

        printStatistics(times);
    }

    private static JanusGraph getJanusGraph(boolean customIds) {

        JanusGraphFactory.Builder builder = JanusGraphFactory.build()
                .set("storage.backend", "inmemory")
                .set("ids.authority.wait-time", "5")
                .set("ids.renew-timeout", "50")
                .set("ids.block-size", "1000000000")
                //.set("cluster.max-partitions", "2")
                .set("ids.renew-percentage", "0.3");

        if (customIds) {
            builder = builder.set("graph.set-vertex-id", "true");
        }

        return builder.open();
    }

    static long verticesCount(JanusGraph graph) {
        try (JanusGraphTransaction tx = graph.newTransaction()) {
            GraphTraversalSource g = tx.traversal();
            return g.V().count().next();
        }
    }

    static class MeasuredTime {
        final long elapsedTime;
        final long creationTime;
        final long insertTime;

        public MeasuredTime(long elapsedTime, long creationTime, long insertTime) {
            this.elapsedTime = elapsedTime;
            this.creationTime = creationTime;
            this.insertTime = insertTime;
        }

        @Override
        public String toString() {
            return String.format(
                    "elapsed time:  %d(ms), graph creation time: %d, vertices insertion time: %d(ms)",
                    elapsedTime, creationTime, insertTime);
        }
    }

    public static MeasuredTime testJanusGraph(boolean customIds) {

        long time = System.currentTimeMillis();
        try (JanusGraph graph = getJanusGraph(customIds)) {

            IDManager idManager = null;

            if (customIds) {
                idManager = ((StandardJanusGraph) graph).getIDManager();
            }

            long creationTime = System.currentTimeMillis();

            try (JanusGraphTransaction tx = graph.newTransaction()) {
                GraphTraversalSource g = tx.traversal();

                for (int i = 1; i <= VERTICES; i++) {
                    String value = String.format("%s-%d", VALUE, i);
                    GraphTraversal<Vertex, Vertex> traversal = g
                            .addV(LABEL)
                            .property(KEY, value);

                    if (customIds) {
                        long id = idManager.toVertexId(i);
                        traversal = traversal.property(T.id, id);
                    }

                    traversal.next();
                }
                tx.commit();
            }
            long elapsedTime = System.currentTimeMillis();
            long insertTime = elapsedTime - creationTime;
            elapsedTime = elapsedTime - time;
            creationTime = creationTime - time;

            MeasuredTime measuredTime = new MeasuredTime(elapsedTime, creationTime, insertTime);
            if (DEBUG) {
                printDebugMessage(graph, measuredTime);
            }

            return measuredTime;
        }
    }

    private static void printStatistics(MeasuredTime[] times) {

        System.out.printf("Vertices: %d, customIds %b%n", VERTICES, USE_CUSTOM_IDS);
        System.out.printf("First call ");
        System.out.printf("%s%n", times[0]);

        double totalElapsedTime = 0;
        double totalCreationTime = 0;
        double totalInsertTime = 0;

        for (int i = 1; i < times.length; i++) {
            MeasuredTime measuredTime = times[i];
            totalElapsedTime += measuredTime.elapsedTime;
            totalCreationTime += measuredTime.creationTime;
            totalInsertTime += measuredTime.insertTime;
        }

        // exclude first item
        int size = times.length - 1;

        System.out.printf("Average time elapsed: %.2f(ms), creation: %.2f(ms), vertices insertion: %.2f(ms)%n",
                totalElapsedTime / size, totalCreationTime / size, totalInsertTime / size);

    }

    private static void printDebugMessage(JanusGraph graph, MeasuredTime measuredTime) {
        long vertices = verticesCount(graph);
        System.out.printf("[custom ids] vertices: %d," +
                        " elapsed time:  %d(ms)," +
                        " graph creation time: %d(ms)" +
                        " vertices insertion time: %d(ms)%n",
                vertices, measuredTime.elapsedTime, measuredTime.creationTime, measuredTime.insertTime);
    }
}
stellarspot commented 5 years ago

Here are some results:

Vertices: 10, customIds false
First call elapsed time:  977(ms), graph creation time: 587, vertices insertion time: 390(ms)
Average time elapsed: 258.00(ms), creation: 11.11(ms), vertices insertion: 246.89(ms)

Vertices: 10, customIds true
First call elapsed time:  827(ms), graph creation time: 573, vertices insertion time: 254(ms)
Average time elapsed: 164.67(ms), creation: 12.56(ms), vertices insertion: 152.11(ms)

Vertices: 100, customIds false
First call elapsed time:  1114(ms), graph creation time: 558, vertices insertion time: 556(ms)
Average time elapsed: 324.22(ms), creation: 8.00(ms), vertices insertion: 316.22(ms)

Vertices: 100, customIds true
First call elapsed time:  904(ms), graph creation time: 580, vertices insertion time: 324(ms)
Average time elapsed: 173.56(ms), creation: 6.22(ms), vertices insertion: 167.33(ms)

Vertices: 1000, customIds false
First call elapsed time:  1246(ms), graph creation time: 549, vertices insertion time: 697(ms)
Average time elapsed: 311.00(ms), creation: 2.89(ms), vertices insertion: 308.11(ms)

Vertices: 1000, customIds true
First call elapsed time:  1101(ms), graph creation time: 546, vertices insertion time: 555(ms)
Average time elapsed: 193.78(ms), creation: 2.22(ms), vertices insertion: 191.56(ms)

What I see from these results that graph creation time is about 550ms at first time and about 7ms in average in subsequent calls. This should be fine for my tests.

The problem seems in vertices insertion where the version with custom Ids generation is faster than with automatically generated ones but not enough. For example average time for 10 vertices insertion with custom ids is 150ms. It seems long time for 10 vertices insertion. Insertion of 1000 vertices takes about in average 190ms. So difference between 10 vertices insertion and 1000 is about 40ms.

stellarspot commented 5 years ago

Using option cluster.max-partitions=2 gives result for 10 vertices insertion:

Vertices: 10, customIds false
First call elapsed time:  849(ms), graph creation time: 574, vertices insertion time: 275(ms)
Average time elapsed: 184.00(ms), creation: 10.89(ms), vertices insertion: 173.11(ms)

Vertices: 10, customIds true
First call elapsed time:  823(ms), graph creation time: 570, vertices insertion time: 253(ms)
Average time elapsed: 160.89(ms), creation: 11.44(ms), vertices insertion: 149.44(ms)

It seems does not much differ if the cluster.max-partitions option is not set.

stellarspot commented 5 years ago

I believe that summary can be changed for the issue as it does not depend on that inmemory storage is used.

mad commented 5 years ago

When graph is populated it needs create some schema metadata for vertices, edges, properties. All of these entities needs own id. You can configure your schema before any insertion using ManagementSystem api it will reduce timeout for id block gathering

JanusGraphManagement mgmt = graph.openManagement();
mgmt.makeVertexLabel(LABEL).make();
mgmt.makePropertyKey(KEY).dataType(String.class).make();
mgmt.commit();

One of way to decrease timing for ids acquisition https://github.com/mad/janusgraph/commit/342a1814b26a8b63f9337f88145c0dcb8f628a09 it makes sense only for local storage - just remove additional await for id block acquisition

dk-github commented 5 years ago

@mad, looks like we were both looking at it at the same time :)

@stellarspot I came to the same conclusion as @mad - namely that a) to obtain correct timings, you need to explicitly initialize your schema, otherwise it happens on first vertex creation, and it looks as if vertex creation is slower than it actually is b) when you do that, you can see that most time is actually spent in schema creation, and within that - in waiting for the ids c) in case of using it for tests, we know the local id authority will be the only one allocating blocks, so we can simply set ids.authority.wait-time to zero, and it becomes much faster

I have modified your program to illustrate, the results I got on my PC look like this:

Vertices: 1000, customIds true First call elapsed time: 1418(ms), graph creation time: 942, vertices insertion time: 476(ms) Average time elapsed: 197.68(ms), creation: 112.11(ms), vertices insertion: 85.58(ms), per vertex: 0.09(ms)

package org.janusgraph;

import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversal;
import org.apache.tinkerpop.gremlin.process.traversal.dsl.graph.GraphTraversalSource;
import org.apache.tinkerpop.gremlin.structure.T;
import org.apache.tinkerpop.gremlin.structure.Vertex;
import org.janusgraph.core.JanusGraph;
import org.janusgraph.core.JanusGraphFactory;
import org.janusgraph.core.JanusGraphTransaction;
import org.janusgraph.core.schema.JanusGraphManagement;
import org.janusgraph.graphdb.database.StandardJanusGraph;
import org.janusgraph.graphdb.idmanagement.IDManager;

public class test {

    // --- Parameters for test ---
    private static final int VERTICES = 1000;
    private static final int ITERATIONS = 20;
    private static final boolean USE_CUSTOM_IDS = true;
    // --- ------------------- ---

    private static final boolean DEBUG = false;

    private static final String LABEL = "SampleLabel";
    private static final String KEY = "SampleKey";
    private static final String VALUE = "SampleValue";

    public static void main(String[] args) {

        MeasuredTime[] times = new MeasuredTime[ITERATIONS];
        for (int i = 0; i < ITERATIONS; i++) {
            MeasuredTime measuredTime = testJanusGraph(USE_CUSTOM_IDS);
            times[i] = measuredTime;
        }

        printStatistics(times);
    }

    private static JanusGraph getJanusGraph(boolean customIds) {

        JanusGraphFactory.Builder builder = JanusGraphFactory.build()
            .set("storage.backend", "inmemory")
            .set("ids.authority.wait-time", "0")
        ;

        if (customIds) {
            builder = builder.set("graph.set-vertex-id", "true");
        }

        return builder.open();
    }

    static long verticesCount(JanusGraph graph) {
        try (JanusGraphTransaction tx = graph.newTransaction()) {
            GraphTraversalSource g = tx.traversal();
            return g.V().count().next();
        }
    }

    static class MeasuredTime {
        final long elapsedTime;
        final long creationTime;
        final long insertTime;

        public MeasuredTime(long elapsedTime, long creationTime, long insertTime) {
            this.elapsedTime = elapsedTime;
            this.creationTime = creationTime;
            this.insertTime = insertTime;
        }

        @Override
        public String toString() {
            return String.format(
                "elapsed time:  %d(ms), graph creation time: %d, vertices insertion time: %d(ms)",
                elapsedTime, creationTime, insertTime);
        }
    }

    public static MeasuredTime testJanusGraph(boolean customIds) {

        long time = System.currentTimeMillis();
        try (JanusGraph graph = getJanusGraph(customIds)) {

            IDManager idManager = null;

            if (customIds) {
                idManager = ((StandardJanusGraph) graph).getIDManager();
            }

            makeSchema(graph);

            long creationTime = System.currentTimeMillis();

            try (JanusGraphTransaction tx = graph.newTransaction()) {
                GraphTraversalSource g = tx.traversal();

                for (int i = 1; i <= VERTICES; i++) {
                    String value = String.format("%s-%d", VALUE, i);
                    GraphTraversal<Vertex, Vertex> traversal = g
                        .addV(LABEL)
                        .property(KEY, value);

                    if (customIds) {
                        long id = idManager.toVertexId(i);
                        traversal = traversal.property(T.id, id);
                    }

                    traversal.next();
                }
                tx.commit();
            }
            long elapsedTime = System.currentTimeMillis();
            long insertTime = elapsedTime - creationTime;
            elapsedTime = elapsedTime - time;
            creationTime = creationTime - time;

            MeasuredTime measuredTime = new MeasuredTime(elapsedTime, creationTime, insertTime);
            if (DEBUG) {
                printDebugMessage(graph, measuredTime);
            }

            return measuredTime;
        }
    }

    private static void makeSchema(JanusGraph graph) {
        JanusGraphManagement mgmt = graph.openManagement();
        mgmt.makeVertexLabel(LABEL).make();
        mgmt.makePropertyKey(KEY).dataType(String.class).make();
        mgmt.commit();
    }

    private static void printStatistics(MeasuredTime[] times) {

        System.out.printf("Vertices: %d, customIds %b%n", VERTICES, USE_CUSTOM_IDS);
        System.out.printf("First call ");
        System.out.printf("%s%n", times[0]);

        double totalElapsedTime = 0;
        double totalCreationTime = 0;
        double totalInsertTime = 0;

        for (int i = 1; i < times.length; i++) {
            MeasuredTime measuredTime = times[i];
            totalElapsedTime += measuredTime.elapsedTime;
            totalCreationTime += measuredTime.creationTime;
            totalInsertTime += measuredTime.insertTime;
        }

        // exclude first item
        int size = times.length - 1;

        System.out.printf("Average time elapsed: %.2f(ms), creation: %.2f(ms), vertices insertion: %.2f(ms), per vertex: %.2f(ms)%n",
            totalElapsedTime / size, totalCreationTime / size, totalInsertTime / size, totalInsertTime / size / VERTICES);

    }

    private static void printDebugMessage(JanusGraph graph, MeasuredTime measuredTime) {
        long vertices = verticesCount(graph);
        System.out.printf("[custom ids] vertices: %d," +
                " elapsed time:  %d(ms)," +
                " graph creation time: %d(ms)" +
                " vertices insertion time: %d(ms)%n",
            vertices, measuredTime.elapsedTime, measuredTime.creationTime, measuredTime.insertTime);
    }
}
dk-github commented 5 years ago

To summarize:

JanusGraphFactory.Builder builder = JanusGraphFactory.build()
            .set("storage.backend", "inmemory")
            .set("ids.authority.wait-time", "0")
        ;
stellarspot commented 5 years ago

To summarize:

I would add one more thing that adding 10 vertices is also slow and even with option ids.authority.wait-time=0 the example above with parameters:

    // --- Parameters for test ---
    private static final int VERTICES = 10;
    private static final int ITERATIONS = 10;
    private static final boolean USE_CUSTOM_IDS = false;
    // --- ------------------- ---

and JanusGraph:

    private static JanusGraph getJanusGraph(boolean customIds) {
        JanusGraphFactory.Builder builder = JanusGraphFactory.build()
                .set("storage.backend", "inmemory")
                .set("ids.authority.wait-time", "0");
        return builder.open();
    }

gives results:

Vertices: 10, customIds false
First call elapsed time:  837(ms), graph creation time: 539, vertices insertion time: 298(ms)
Average time elapsed: 160.56(ms), creation: 9.67(ms), vertices insertion: 150.89(ms)

It looks slow adding 10 vertices for 150ms in average.

stellarspot commented 5 years ago

I have tests in my project and using graph with ids.authority.wait-time=0 option set it throws exception:

    public static JanusGraph getInMemoryJanusGraph(boolean customIds) {
        JanusGraphFactory.Builder builder = JanusGraphFactory.build()
                .set("storage.backend", "inmemory")
                .set("ids.authority.wait-time", "0");

        return builder.open();
    }
[Test worker] INFO org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder - Set default timestamp provider MICRO
[Test worker] INFO org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever - Generated unique-instance-id=7f00010117137-eternity1
[Test worker] INFO org.janusgraph.diskstorage.Backend - Initiated backend operations thread pool of size 24
[Test worker] INFO org.janusgraph.diskstorage.log.kcvs.KCVSLog - Loaded unidentified ReadMarker start time 2019-10-21T09:00:28.039Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@5be6ea3
[JanusGraphID(0)(4)[0]] WARN org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority - Temporary storage exception while acquiring id block - retrying in PT0S: {}
org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [1, 51) in PT0.001S => too slow, threshold is: PT0S
    at org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority.getIDBlock(ConsistentKeyIDAuthority.java:268)
    at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:288)
    at org.janusgraph.graphdb.database.idassigner.StandardIDPool$IDBlockGetter.call(StandardIDPool.java:255)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
dk-github commented 5 years ago

To summarize:

I would add one more thing that adding 10 vertices is also slow and even with option ids.authority.wait-time=0 the example above with parameters: ...

It looks slow adding 10 vertices for 150ms in average.

Yes, as illustrated by the modified code I have posted above - most of the 150ms is taken by creating schema. After the schema is created, adding vertices is cheap.

In your original example, schema was created on the fly when you have added first vertex, and so in your timings it is included in "vertices insertion".

In my modified example, schema is created upfront, and is not counted as "vertices creation", which illustrates that once your graph is created and schema is set up, you can create lots of vertices pretty fast - in my example it was adding 1000 vertices at an average rate of 0.09ms per vertex, and this drops further as the number of vertices you insert grows.

But it is true that if you need to create a lot of graphs from scratch and in each one you only need to create 10 vertices, then creating the graph and the schema is where you lose most of the time.

One possible solution to this would be to create graph+schema at setup stage, reuse the same graph for a series of tests (clearing it between each test), and then close at the end. This way, you only pay the price of creating the graph once per a group of test.

At run time, your application is likely to create the graph just once, so it is usually not an issue.

dk-github commented 5 years ago

I have tests in my project and using graph with ids.authority.wait-time=0 option set it throws exception:

org.janusgraph.diskstorage.idmanagement.ConsistentKeyIDAuthority - Temporary storage  exception while acquiring id block - retrying in PT0S: {}
org.janusgraph.diskstorage.TemporaryBackendException: Wrote claim for id block [1, 51) in PT0.001S => too slow, threshold is: PT0S

hmm, it worked consistently for me every time I tried it locally. Try setting it to 1 instead of 0 - I have just tried it and it still provides a good speed-up compared to the default setting.

stellarspot commented 5 years ago

Sorry, I missed the sample with the makeSchema() method. I will check how to reuse it in my tests.

The ConsistentKeyIDAuthority I see not in the provided sample but in my larger project where I have request to JanusGraph first to check if the vertex is created and if not then create it. This exception is repeated on my project even with ids.authority.wait-time set to 15. For this value set to 50 I do not see the exception.

stellarspot commented 5 years ago

When graph is populated it needs create some schema metadata for vertices, edges, properties. All of these entities needs own id.

Just a question. Is it possible to assign custom ids for vertices, edges, and properties from metadata scheme? It would be useful if I assign custom ids for ordinary vertices and edges anyway.

What is the right way to assign custom ids for ordinary properties?

dk-github commented 5 years ago

@stellarspot, no it is not possible to assign custom ids for schema elements, unless you want to bypass the schema management APIs and go to a lower level. But crucially, it is unclear why would you want to do this, and why it is important?

As discussed above, in the "normal" use case a program would only create the graph and initialize the schema once, at the start, and then use that graph for the life time of its process. So a couple of seconds spent at the start to set things up are usually not a problem, especially if you are using a complex backend backed by a remote database etc. We have demonstrated above that, once the graph and the schema is initialised, subsequent graph manipulations such as adding vertices are pretty fast.

In previous discussion, we have assumed that your problem arises when you run a large set of unit or integration tests, where each test creates its own graph and tears it down at the end. In such case, indeed spending 1 second to initialise each test, times say 500 tests, becomes a bit too much. So we have discussed a number of options which allow you to speed up this scenario:

If you have another use case where spending a second on graph initialisation represents a problem - please can you elaborate?

stellarspot commented 5 years ago

Yes, I will try to reuse existed graph where schema has been already created.

I was just wondering if it is allowed to user to provide custom ids for vertices, edges, and properties why he can't set a custom id generator which would be used for schema id generation as well. If there are no reasonable use cases for this it is ok.

dk-github commented 5 years ago

@mad do you want to raise a PR for your proposed change to automatically skip waiting for non-distributed store managers?

porunov commented 4 years ago

@stellarspot @mad @dk-github @FlorianHockmann As I understand the issue was not a bug but rather an improvement request. Is it correct? @mad merged #1884 into master branch, so there will be no delay for id allocation for non distributed stores. Now, I am not sure what exactly should be implemented to resolve this issue. Should we just close this issue and split it into multiple smaller issues (if there are any)?