apache / incubator-hugegraph

A graph database that supports more than 100+ billion data, high performance and scalability (Include OLTP Engine & REST-API & Backends)
https://hugegraph.apache.org
Apache License 2.0
2.62k stars 518 forks source link

[Question] 日志中每天都有多条GremlinGroovyScriptEngine 相关的WARN #1518

Closed chencjiajy closed 3 years ago

chencjiajy commented 3 years ago

Environment ( 环境信息 - 必填 )

Your Question ( 问题描述 )

最近同事反映服务经常超时(1s),我去看下日志文件hugegraph-server.log 中每天都有多条关于gremlin 查询慢的警告,类似如下截图: 企业微信截图_4e5e1b1a-7c66-4f59-ae1c-fb7f0e4ae3c0

将这些报错的语句拎出来看执行计划,主要耗时在获取出边这里,测试时也是很快

这种查询慢告警有什么解决办法?

执行计划 ( 点击展开 ) { "id": "3963dd4e-eb17-4f4b-8b57-f7527be8bd10", -"data": [ -{ "dur": 18.297321, -"metrics": [ -{ "dur": 2.209809, -"counts": { "traverserCount": 1, "elementCount": 1 }, "name": "HugeGraphStep(Vertex,[~label.eq(Product), pid.eq(a1Ooxqnm)])", -"annotations": { "percentDur": 12.077227043237642 }, "id": "15.0.0()" }, -{ "dur": 11.4045, -"counts": { "traverserCount": 5, "elementCount": 5 }, "name": "HugeVertexStep(OUT,[PRODUCT_SKILL],vertex)", -"annotations": { "percentDur": 62.32879665826489 }, "id": "13.0.0()" }, -{ "dur": 4.365168, "counts": { }, "name": "AndStep([[HasStep([type.eq(Enum)]), ProfileStep], [HasStep([ability.eq(executable)]), ProfileStep], [HugeVertexStep(OUT,[PID_ENUM],Vertex,[name.eq(睡眠)]), ProfileStep], [HasStep([white_list.eq(0)]), ProfileStep]])@[skill]", -"annotations": { "percentDur": 23.856869538442268 }, "id": "3.0.0()", -"metrics": [ -{ "dur": 0.087935, "counts": { }, "name": "HasStep([type.eq(Enum)])", "id": "0.1.0(3.0.0())" }, -{ "dur": 0.003447, "counts": { }, "name": "HasStep([ability.eq(executable)])", "id": "0.1.1(3.0.0())" }, -{ "dur": 4.13142, "counts": { }, "name": "HugeVertexStep(OUT,[PID_ENUM],Vertex,[name.eq(睡眠)])", "id": "2.1.2(3.0.0())" } ] }, -{ "dur": 0.08111, "counts": { }, "name": "HugeVertexStep(OUT,[PID_ENUM],Vertex,[name.eq(睡眠)])@[enum]", -"annotations": { "percentDur": 0.4432889383096028 }, "id": "14.0.0()" }, -{ "dur": 0.027422, "counts": { }, "name": "SelectOneStep(last,skill)", -"annotations": { "percentDur": 0.149868934364763 }, "id": "6.0.0()" }, -{ "dur": 0.01582, "counts": { }, "name": "NoOpBarrierStep(2500)", -"annotations": { "percentDur": 0.08646074471776496 }, "id": "12.0.0()" }, -{ "dur": 0.013806, "counts": { }, "name": "CoalesceStep([[HugeVertexStep(OUT,[PID_SKILL_ALIAS],vertex), ProfileStep, PropertiesStep([name],value), ProfileStep], [ConstantStep, ProfileStep]])@[skill_nickname]", -"annotations": { "percentDur": 0.07545366887316454 }, "id": "7.0.0()" }, -{ "dur": 0.088455, "counts": { }, "name": "SelectStep(last,[skill, enum, skill_nickname],[[PropertyMapStep([name, type, code1],[[UnfoldStep, ProfileStep]],value), ProfileStep], [PropertyMapStep([[UnfoldStep, ProfileStep]],value), ProfileStep], [FoldStep, ProfileStep]])", -"annotations": { "percentDur": 0.4834314269285651 }, "id": "8.0.0()" }, -{ "dur": 0.091231, "counts": { }, "name": "NoOpBarrierStep(2500)", -"annotations": { "percentDur": 0.49860304686134105 }, "id": "11.0.0()" } ] } ], "type": "OTHER", "duration": "29", "showNum": 1, "message": "", +"graph": { … } } Real-time Success. Duration 0.029s.

注:我们的服务主要是封装gremlin查询语句

imbajin commented 3 years ago

慢警告就是 slow log 的输出, 你说的解决办法不知道是啥意思

  1. 如果是不希望看到 gremlin 的慢日志输出, 可以修改 conf/log4j2.xml 里的日志级别之类的
  2. 如果你是说如何加速语句... 这个可以尝试单独拆开 issue, 另外可以使用 profile() 之类的内置语法看看耗时

另外可以尝试升级一下 server 整体版本(数据不兼容), 可能可以有不少的性能和功能上提升优化

chencjiajy commented 3 years ago

我想问的解决办法是指:怎么可以避免服务调用查询语句变慢, 我贴了一个日志中报慢查询的语句的profile()执行计划,以我目前的水平看来,语句上没有优化空间了,单独测日志中的告警语句平均耗时都是几十毫秒。

请问升级server版本到0.11.2, 具体性能对比有官方评测数据吗?没有找到相关对比文档

imbajin commented 3 years ago

有测试性能报告的, 整理后下个月应该就发布了.

当然你可以自己简单实测对比一下最科学, 毕竟不同的测试用例和场景也有不少区别

至于避免服务调用语句变慢, 这是性能分析的范畴了, 可以尝试更大的缓存, 以及 server 端的读写分离, 负载均衡之类的操作. 以及你升级到0.11之后可以有自定义的 CustomPath API (详见文档 traversal), 如果有能使用 rest-api 替代的会性能好很多