apache / incubator-hugegraph

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
https://hugegraph.apache.org
Apache License 2.0
2.62k stars 518 forks source link

Duplicate results on outside() and not(between()) API #1586

Open Yicheng-Wang opened 3 years ago

Yicheng-Wang commented 3 years ago

Bug Type (问题类型)

gremlin (结果不合预期)

Before submit

Environment (环境信息)

Expected & Actual behavior (期望与实际表现)

Expected behavior

We executed the query “g.V().has('length', not(between(423,-23)))”, and expected to get the vertices whose property ‘length’ is less than the first provided number 423 or greater than the second -23.

Actual behavior

In theory, the vertex which is in this range could be returned, but there should be no duplicate vertices. However, we found some of the target vertices appears more than once in the results, which would not happen in other gremlin-based graph databases in the same cases. This bug also exists on not(between()) API.

Example to reproduce

        hugegraph.schema().propertyKey("length").asInt().ifNotExist().create();
        hugegraph.schema().vertexLabel("rope").properties("length").nullableKeys("length").create();
        hugegraph.schema().indexLabel("ropebylength").onV("rope").by("length").shard().ifNotExist().create();

        GraphManager graph = hugegraph.graph();

        Vertex rope1 = new Vertex("rope").property("length", 546);
        Vertex rope2 = new Vertex("rope").property("length", 12368578);
        Vertex rope3 = new Vertex("rope").property("length", 1);
        Vertex rope4 = new Vertex("rope").property("length", 47568);

        graph.addVertices(Arrays.asList(rope1, rope2, rope3, rope4));

        GremlinManager gremlin = hugegraph.gremlin();

        String query0 = "g.V().has('length', not(between(423,-23)))";
        System.out.println("query0 : " + query0);
        try {
            ResultSet hugeResult = gremlin.gremlin(query0).execute();
            Iterator<Result> huresult = hugeResult.iterator();
            huresult.forEachRemaining(result -> {
                Object object = result.getObject();
                System.out.println(object);
            });
        } catch (Exception e) {
            e.printStackTrace();
        }

        String query1 = "g.V().has('rope', 'length', outside(423,-23))";
        System.out.println("query1 : " + query1);
        try {
            ResultSet hugeResult = gremlin.gremlin(query1).execute();
            Iterator<Result> huresult = hugeResult.iterator();
            huresult.forEachRemaining(result -> {
                Object object = result.getObject();
                System.out.println(object);
            });
        } catch (Exception e) {
            e.printStackTrace();
        }

Vertex/Edge example (问题点 / 边数据举例)

Take "g.V().has('length', not(between(423,-23)))" as an example, results returned by HugeGraph are as following:

{id=495172332315213824, label=rope, properties={length=1}}
{id=495172332311019520, label=rope, properties={length=546}}
{id=495172332315213825, label=rope, properties={length=47568}}
{id=495172332311019521, label=rope, properties={length=12368578}}
{id=495172332315213824, label=rope, properties={length=1}}

It does work if we add "dedup()":

{id=495171275698733058, label=rope, properties={length=1}}
{id=495171275698733056, label=rope, properties={length=546}}
{id=495171275698733059, label=rope, properties={length=47568}}
{id=495171275698733057, label=rope, properties={length=12368578}}

However, we do not need "dedup()" to have nonredundant results in other gremlin-based graph databases, e.g., JanusGraph:

result{object=v[4216] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
result{object=v[8312] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
result{object=v[12408] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}
result{object=v[4272] class=org.apache.tinkerpop.gremlin.structure.util.detached.DetachedVertex}

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

imbajin commented 3 years ago

get it, maybe could try to use g.V().has('length', not(between(423,-23))).dedup() first?

and u can paste the return vertices in Vertex/Edge example (问题点 / 边数据举例)

Yicheng-Wang commented 3 years ago

get it, maybe could try to use g.V().has('length', not(between(423,-23))).dedup() first?

and u can paste the return vertices in Vertex/Edge example (问题点 / 边数据举例)

Thanks for your response~ .dedup() works, and I paste the return vertices in Vertex/Edge example (问题点 / 边数据举例) now. Will this be solved in the future?

imbajin commented 3 years ago

Yep, after our test-dev ensure & reproduce it, we'll marked it in dev schedule

And the issue with the bug tag will not be closed until fixed

javeme commented 2 years ago

@Yicheng-Wang We can improve this case in the future, and welcome to contribute code: https://github.com/hugegraph/hugegraph/blob/master/CONTRIBUTING.md