apache / incubator-hugegraph

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
https://hugegraph.apache.org
Apache License 2.0
2.62k stars 518 forks source link

[Question] gremlin 高并发查询性能优化 #1777

Closed JackyYangPassion closed 2 years ago

JackyYangPassion commented 2 years ago

Problem Type (问题类型)

performence (性能优化)

Before submit

Environment (环境信息)

Your Question (问题描述)

并发查询性能较差

Vertex/Edge example (问题点 / 边数据举例)

POST http://ip:port/gremlin
{
    "gremlin":{gremlin 查询语句},
    "bindings": {},
    "language": "gremlin-groovy",
    "aliases": {}

}

gremlin 查询语句
g.V('703').bothE('phone')
  .as('b')
  .has('edge_first_time',between('2021-06-01 00:00:00','2022-02-16 00:00:03'))
  .otherV()
  .simplePath()
  .hasId('154')
  .select('b')
  .by(valueMap('edge_code'))
  .dedup()

优化前 并发查询 查询耗时需要1S左右

通过profile 查看 执行时间
g.V('703').bothE('phone')
  .as('b')
  .has('edge_first_time',between('2021-06-01 00:00:00','2022-02-16 00:00:03'))
  .otherV()
  .simplePath()
  .hasId('154')
  .select('b')
  .by(valueMap('edge_code'))
  .dedup().profile()
 整体执行时间dur 为27ms 但是整个http 返回结果却用了 1S左右 「postman 记录查询时间」
`"result": {
        "data": [
            {
                "dur": 27.087717,
                "metrics": [
                    {
                        "dur": 2.047777,
                        "counts": {
                            "traverserCount": 1,
                            "elementCount": 1
                        },
                        "name": "HugeGraphStep(vertex,[703])",
                        "annotations": {
                            "percentDur": 7.559799151770524
                        },
                        "id": "10.0.0()"
                    },
                    {
                        "dur": 21.606062,
                        "counts": {
                            "traverserCount": 184,
                            "elementCount": 184
                        },
                        "name": "HugeVertexStep(IN,[phone],Edge,[edge_first_time.and(gte(2021-06-01 00:00:00), lt(2022-02-16 00:00:03))])@[b]",
                        "annotations": {
                            "percentDur": 79.7633185550484
                        },
                        "id": "11.0.0()"
                    },
                    {
                        "dur": 0.777839,
                        "counts": {
                            "traverserCount": 184,
                            "elementCount": 184
                        },
                        "name": "EdgeOtherVertexStep",
                        "annotations": {
                            "percentDur": 2.8715561374182994
                        },
                        "id": "3.0.0()"
                    },
                    {
                        "dur": 0.476471,
                        "counts": {
                            "traverserCount": 184,
                            "elementCount": 184
                        },
                        "name": "PathFilterStep(simple)",
                        "annotations": {
                            "percentDur": 1.7589928305881224
                        },
                        "id": "4.0.0()"
                    },
                    {
                        "dur": 0.255006,
                        "counts": {
                            "traverserCount": 1,
                            "elementCount": 1
                        },
                        "name": "HasStep([~id.eq(154)])",
                        "annotations": {
                            "percentDur": 0.9414082404951292
                        },
                        "id": "5.0.0()"
                    },
                    {
                        "dur": 1.5937,
                        "counts": {
                            "traverserCount": 1,
                            "elementCount": 1
                        },
                        "name": "SelectOneStep(last,b,[PropertyMapStep([edge_code],value), ProfileStep])",
                        "annotations": {
                            "percentDur": 5.883478478455752
                        },
                        "id": "6.0.0()",
                        "metrics": [
                            {
                                "dur": 0.151621,
                                "counts": {
                                    "traverserCount": 1,
                                    "elementCount": 1
                                },
                                "name": "PropertyMapStep([edge_code],value)",
                                "id": "0.1.0(6.0.0())"
                            }
                        ]
                    },
                    {
                        "dur": 0.330862,
                        "counts": {
                            "traverserCount": 1,
                            "elementCount": 1
                        },
                        "name": "DedupGlobalStep",
                        "annotations": {
                            "percentDur": 1.221446606223773
                        },
                        "id": "7.0.0()"
                    }
                ]
            }
        ],
        "meta": {}
    }
}`

优化后 #1126 预编译gremlin
{
    "gremlin": "hugegraph.traversal().V(a_id).bothE(edge_type).as('b').has('edge_first_time',between(start_time,end_time)).otherV().simplePath().hasId(b_id).select('b').by(valueMap('edge_code')).dedup()",
    "bindings": {"a_id":"703","edge_type":"phone","start_time":"2021-06-01 00:00:00","end_time":"2022-02-16 00:00:03","b_id":"154"},
    "language": "gremlin-groovy",
    "aliases": {}
}
高并发查询 整体耗时 降低到10ms 以内 符合预期

几个问题:
1.gremlin 预编译 消耗将近900ms 以上 这个符合预期吗 以及在metrics 中能否监控预编译占用的时间
2.profile 展示的消耗时间 是不是不包含 gremlin 预编译时间 只是每一步真正查询计算的耗时
3.gremlin 写法中hugegraph.traversal().V(a_id).bothE(edge_type)  edge_type这个变量如何写多个值,比如需要 'a','b','c' 三种出边 那么在bindings中如何写
"bindings": {"a_id":"703","edge_type":"a,b,c"} 这样写完 解析的边类型为"a,b,c"而不是a b c 三种边

Schema [VertexLabel, EdgeLabel, IndexLabel] (元数据结构)

No response

javeme commented 2 years ago

@JackyYangPassion 分别解答如下:

  1. gremlin 预编译 消耗将近900ms 以上 这个符合预期吗 以及在metrics 中能否监控预编译占用的时间?

Gremlin 预编译一般在百毫秒级别,主要是把执行语句编译为Java字节码class,900ms可能比较稍微高一些,我这边postman测试预编译耗时一般在200ms~300ms左右。 Metrics增加预编译占用,实现应该是没问题,不过可能需要修改TinkerPop框架代码,如果有兴趣可以尝试贡献到TInkerPop社区,我也可以帮忙Review代码。

  1. profile 展示的消耗时间 是不是不包含 gremlin 预编译时间 只是每一步真正查询计算的耗时

对的,profile() 不包括预编译时间,确切的说,如果有多条语句的话,也不包括其它语句的执行时间。

  1. edge_type这个变量如何写多个值?

可以使用列表参数[],参考如下示例:

{
    "gremlin": "hugegraph.traversal().V(a_id).outE(edge_type as String[])",
    "bindings": {"a_id": 1, "edge_type": ["edge-label1", "edge-label2"]},
    "language": "gremlin-groovy",
    "aliases": {}
}

其中edge_type as String[]edge_type.toArray(new String[0]) 等价。

github-actions[bot] commented 2 years ago

Due to the lack of activity, the current issue is marked as stale and will be closed after 20 days, any update will remove the stale label