[Question] The number of edges I queried is inconsistent with the number of edges I imported

LiJie20190102 commented 1 year ago

Problem Type (问题类型)

others (please edit later)

Before submit

[X] 我已经确认现有的 Issues 与 FAQ 中没有相同 / 重复问题 (I have confirmed and searched that there are no similar problems in the historical issue and documents)

Environment (环境信息)

Server Version: 1.0.0 (Apache Release Version)
Backend: RocksDB x nodes, HDD or SSD
OS: xx CPUs, xx G RAM, Ubuntu 2x.x / CentOS 7.x
Data Size: 65608366 vertices, 1806067135 edges

Your Question (问题描述)

I imported 65608366 vertices and 1806067135 edges. When I used hugegraph-computer or gremlin to query, the number of query edges was correct.

However, when I used "hugeClient.traverser().iteratorEdges(shard, 500)" to query the number of edges for each shard, and finally accumulated it, I found that there was an additional number of edges (1806312225 at this time). I don't know why the numbers were inconsistent. Can't we use "hugeClient. traverer(). iteratorEdges" to obtain the data size of all edges?

hugegraph-computer log:

gremlin result:

"hugeClient.traverser().iteratorEdges(shard, 500)" detail: Step 1：Query all shards information (http://x.x.x.x:8065/graphs/hugegraph/traversers/edges/shards?split_size=1048576) Step 2：Use "hugeClient. traverser(). iteratorEdges" to obtain the number of edges for each shard and then sum them。 result: Number of edges is 1806312225 , not 1806067135 .

Vertex/Edge example (问题点 / 边数据举例)

No response

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

{
    "vertexlabels": [
        {
            "id": 1,
            "name": "person",
            "id_strategy": "CUSTOMIZE_NUMBER",
            "primary_keys": [],
            "nullable_keys": [],
            "index_labels": [
                "personByAge"
            ],
            "properties": [
                "id"
            ],
            "status": "CREATED",
            "ttl": 0,
            "enable_label_index": true,
            "user_data": {
                "~create_time": "2023-03-13 09:52:29.084"
            }
        }
    ]
}

{
    "edgelabels": [
        {
            "id": 1,
            "name": "friend",
            "source_label": "person",
            "target_label": "person",
            "frequency": "SINGLE",
            "sort_keys": [],
            "nullable_keys": [],
            "index_labels": [],
            "properties": [],
            "status": "CREATED",
            "ttl": 0,
            "enable_label_index": true,
            "user_data": {
                "~create_time": "2023-03-13 09:52:30.760"
            }
        }
    ]
}

imbajin commented 1 year ago

Thanks a lot for the details, could u tell us how to reproduce it with the minimum data?

LiJie20190102 commented 1 year ago

Thanks a lot for the details, could u tell us how to reproduce it with the minimum data?

Sorry, I don't know yet. When the number of vertices is 65608366, it is still found to be the correct number

LiJie20190102 commented 1 year ago

@coderzc @imbajin Hello, are you dealing with this issue? I think this issue is more important. Thank you for helping me with it

imbajin commented 1 year ago

@coderzc @imbajin Hello, are you dealing with this issue? I think this issue is more important. Thank you for helping me with it

we need to know how to reproduce it first，thanks

LiJie20190102 commented 1 year ago

@coderzc @imbajin Hello, are you dealing with this issue? I think this issue is more important. Thank you for helping me with it

we need to know how to reproduce it first，thanks

The problem scenario is as follows:

First, import 65608366 vertices and 1806067135 edges;

When I used "hugeClient. traverser(). iteratorEdges (shard, 500)" to query and sum the number of edges for each shard, I found that it was 1806312225, not 1806067135.

LiJie20190102 commented 1 year ago

https://blog.csdn.net/penriver/article/details/115124350. We conducted the test based on this article, and the number of edges and vertices is consistent with the article. Please help with this, thank you . @coderzc @imbajin

imbajin commented 1 year ago

blog.csdn.net/penriver/article/details/115124350

OK. get it, thanks for the feedback, you could also try count(-1) in gremlin query

LiJie20190102 commented 1 year ago

When I used 'count (-1)', there were some exceptions

企业微信截图_16805945976451

imbajin commented 1 year ago

When I used 'count (-1)', there were some exceptions

use async way to execute gremlin instead, refer async-gremlin

LiJie20190102 commented 1 year ago

When I use count (-1), I am unable to query the correct data as it displays as 0.

At the same time, when I use count(), I can find the correct data：

javeme commented 1 year ago

please note the 'count (-1)' may mean .limit(-1).count()

LiJie20190102 commented 1 year ago

please note the 'count (-1)' may mean .limit(-1).count() The result is

LiJie20190102 commented 1 year ago

@javeme @imbajin @coderzc Hello, do you have any relevant conclusions?

javeme commented 1 year ago

https://blog.csdn.net/penriver/article/details/115124350. We conducted the test based on this article, and the number of edges and vertices is consistent with the article. Please help with this, thank you . @coderzc @imbajin

@LiJie20190102 do you mean the counts of iteratorEdges() and g.E().count() with the backend rocksdb: count(iteratorEdges()) != g.E().count()

LiJie20190102 commented 1 year ago

https://blog.csdn.net/penriver/article/details/115124350. We conducted the test based on this article, and the number of edges and vertices is consistent with the article. Please help with this, thank you . @coderzc @imbajin

@LiJie20190102 do you mean the counts of iteratorEdges() and g.E().count() with the backend rocksdb: count(iteratorEdges()) != g.E().count()

yeah

LiJie20190102 commented 1 year ago

@javeme @imbajin @coderzc We are planning to use hugegraph in the production environment, but we are currently experiencing this issue. Please help us solve it as soon as possible. Thank you all

imbajin commented 1 year ago

@javeme @imbajin @coderzc We are planning to use hugegraph in the production environment, but we are currently experiencing this issue. Please help us solve it as soon as possible. Thank you all

We welcome you to use HugeGraph. The imprecision of shard may be caused by some empty hole, but we need a way to reproduce it for confirmation & lack the time/priority for now..

In addition, because this case is relatively small, it can only be solved during scheduling. If emergency positioning/special support is needed, you can reply "support" in the Wechat official account

BTW, another good way to get the high priority support is that join our dev community (I'm for everyone, then everyone is for me)

apache / incubator-hugegraph