deepflowio / deepflow-app

GNU Affero General Public License v3.0
11 stars 9 forks source link

[app] optimize query #251

Closed taloric closed 2 months ago

taloric commented 3 months ago

optimize query in trace_l7_flows

  1. worker_numbers 修改为配置,避免在核心数 < 10 的机器上浪费调度能力
  2. 移除 _id to _id_str 逻辑
  3. 减少 construct_from_dataframe 调用,将 tcp_seq/syscall/x_request_id 的查询条件构造通过 dataframe column list 获取(理论上比逐行迭代快,但数据量少可能不明显)
Total Request Sent Request/second Avg Resp Time
before(v6.5.8) 395 1.29 2,625ms
after - 2 477 1.56 2,028ms
再增加 3 修改后的测试结果:(吞吐差异不明显,但平均时延有提升) Total Request Sent Request/second Avg Resp Time
after - 3 478 1.56 1,985 ms

另一个测试集:spans=79,迭代次数 23

Total Request Sent Request/second Avg Resp Time
before(v6.5.8) 75 0.24 17,792ms
after - 2&3 133 0.43 10,006 ms
taloric commented 3 months ago

对于 dataframe 读数据,小数据量可能差异不是特别明显,写了个简单测试: https://gist.github.com/taloric/916309861e18e97945fe554f15639523

分别用三种方法读取 dataframe 列数据:

  1. for index in df.index: df.at[index, 'column']
  2. for row in df.itertuples(): getattr(row, 'column')
  3. df['column'].tolist()

在 100000 样本量下三种方法的测试结果是:

1. 6.277893781661987s
2. 0.38400983810424805s
3. 0.035607337951660156s
taloric commented 3 months ago
  1. 将 _id =xx or _id =xxx... 条件修改为 _id IN (xx),分组条件为 _id >> 32 得到的秒时间戳 对 spans=18 数据集,差异不太明显
Total Request Sent Request/second Avg Resp Time
after - 4 474 1.55 2,040 ms

spans=79 数据集,也不太明显

Total Request Sent Request/second Avg Resp Time
after - 4 97 0.32 13,908 ms
taloric commented 3 months ago
  1. 根据所有 _id 的结果,从中获取 min_time 与 max_time,缩小查询范围,测试差异不大,也不构成负优化

另外这里 _id IN (xxx,yyy) 实测了下,没支持转成 _id 所在的秒进行查询(_id=xxx 可以)

数据集:spans=79 Total Request Sent Request/second Avg Resp Time
after - 5 99 0.32 13,661
taloric commented 2 months ago
  1. 移除 auto_instance_0_node_type auto_instance_0_icon_id auto_instance_1_node_type auto_instance_1_icon_id 无论是代码逻辑还是实际返回都没有用到