apache / skywalking

APM, Application Performance Monitoring System
https://skywalking.apache.org/
Apache License 2.0
23.87k stars 6.52k forks source link

[Fluentd Setup]Some special Postgresql slow sql was not shown in Show statements #10485

Closed qitongfire closed 1 year ago

qitongfire commented 1 year ago

Search before asking

Apache SkyWalking Component

OAP server (apache/skywalking)

What happened

Our project wasing using skywalking to monitor Postgresql slow and found a problem: with "pg_sleep" as (select from pg_sleep(1.4)) select from "pg_sleep", This sql was not shown in Show statements, But this sql "with pg_sleep as (select from pg_sleep(1.4)) select from pg_sleep" display normally. I debuged running Skywalking oap, It could get a message("with pg_sleep as (select from pg_sleep(1.4)) select from pg_sleep"), But it is not stored in the Elasticsearch(We use elasticsearch as the store). That should be a problem.

NOTE:The Skywalking OAP did not show any problems

What you expected to happen

I expect this type of sql to display normally in the Slow statements(ex sql: with "pg_sleep" as (select from pg_sleep(1.4)) select from "pg_sleep"),

Skywalking version: 9.3.0, fluent-bit version: 2.0.9

fluent-bit log: [{"date":1677836101.832795,"body":{"json":{"json":"{\"statement\":\" 1404.043 ms statement: with \"pg_sleep\" as (select from pg_sleep(1.4)) select from \"pg_sleep\";\",\"query_time\":1404,\"service\":\"postgresql::postgres:5432\",\"id\":\"34bacb9e-aea6-6246-49e5-14ace3d4848c\",\"layer\":\"POSTGRESQL\",\"time\":1677836101000}"}},"tags":{"data":[{"value":"SLOW_SQL","key":"LOG_KIND"}]},"service":"postgresql::postgres:5432","layer":"POSTGRESQL"}] [2023/03/03 17:36:44] [ info] [output:http:http.1] 10.255.13.133:12800, HTTP status=200

How to reproduce

Skywalking monitor Postgresql, Enabling Postgresql slow execute the following sql

sql: with "pg_sleep" as (select from pg_sleep(1.4)) select from "pg_sleep"

Anything else

No response

Are you willing to submit PR?

Code of Conduct

wu-sheng commented 1 year ago

fluent-bit log: [{"date":1677836101.832795,"body":{"json":{"json":"{"statement":" 1404.043 ms statement: with "pg_sleep" as (select from pg_sleep(1.4)) select from "pg_sleep";","query_time":1404,"service":"postgresql::postgres:5432","id":"34bacb9e-aea6-6246-49e5-14ace3d4848c","layer":"POSTGRESQL","time":1677836101000}"}},"tags":{"data":[{"value":"SLOW_SQL","key":"LOG_KIND"}]},"service":"postgresql::postgres:5432","layer":"POSTGRESQL"}] [2023/03/03 17:36:44] [ info] [output:http:http.1] 10.255.13.133:12800, HTTP status=200

This seems not a legal JSON. " should be represented as \" in the JSON AFAIK. So, first of all, this could be expected, rather than a bug.

Maybe need to enhance fluentd config? @yswdqz

qitongfire commented 1 year ago

fluent-bit log: [{"date":1677836101.832795,"body":{"json":{"json":"{"statement":" 1404.043 ms statement: with "pg_sleep" as (select from pg_sleep(1.4)) select from "pg_sleep";","query_time":1404,"service":"postgresql::postgres:5432","id":"34bacb9e-aea6-6246-49e5-14ace3d4848c","layer":"POSTGRESQL","time":1677836101000}"}},"tags":{"data":[{"value":"SLOW_SQL","key":"LOG_KIND"}]},"service":"postgresql::postgres:5432","layer":"POSTGRESQL"}] [2023/03/03 17:36:44] [ info] [output:http:http.1] 10.255.13.133:12800, HTTP status=200

This seems not a legal JSON. " should be represented as \" in the JSON AFAIK. So, first of all, this could be expected, rather than a bug.

Maybe need to enhance fluentd config? @yswdqz

This is log in flunt-bit: [{"date":1677838462.133755,"body":{"json":{"json":"{\"statement\":\" 1408.590 ms statement: with \"pg_sleep\" as (select from pg_sleep(1.4)) select from \"pg_sleep\";\",\"query_time\":1408,\"service\":\"postgresql::postgres:5432\",\"id\":\"4545983c-b9c4-ebae-4f7e-9e9ca35918de\",\"layer\":\"POSTGRESQL\",\"time\":1677838462000}"}},"tags":{"data":[{"value":"SLOW_SQL","key":"LOG_KIND"}]},"service":"postgresql::postgres:5432","layer":"POSTGRESQL"}] image

Skywalking oap can get messages normally and Flunt-bit and Skywalking OAP are normal(not show any problems)

wu-sheng commented 1 year ago

If you want to address issues, you could debug the log receiver by yourself.

qitongfire commented 1 year ago

If you want to address issues, you could debug the log receiver by yourself.

I debuged running Skywalking oap, It could get a slow sql json(json in the screenshot above), But the slow sql is not stored in the Elasticsearch. The problem is that Skywalking oap handle slow sql json

wu-sheng commented 1 year ago

This description doesn't help. We never process logs based on the content(SQL text).

qitongfire commented 1 year ago

This description doesn't help. We never process logs based on the content(SQL text).

In the Skyealking OAP,this step is normal, as follows: gYlvdmYgwg

but execute to this method(closure.call()), The code didn't continue to go down: QWlsx2pVNy

Normal logic should execute FilterSpec.extractor, But my debugging did not: n3o2htxrrb

my sql: with "pg_sleep" as (select from pg_sleep(1.4)) select from "pg_sleep";

flunt-bit log: image

You can try the sql I provided

wu-sheng commented 1 year ago

@kezhenxu94 Any guess from LAL engine perspective?

yswdqz commented 1 year ago

I have tested local, and the log of fluent-bit is :

[{"date":1678088131.451024,"tags":{"data":[{"key":"LOG_KIND","value":"SLOW_SQL"}]},"body":{"json":{"json":"{\"id\":\"cb9c1a5b-691e-fb2f-457a-9c72a392d9ed\",\"layer\":\"POSTGRESQL\",\"statement\":\"with \"pg_sleep\" as (select * from pg_sleep(1.4)) select * from \"pg_sleep\"\",\"time\":1678088131000,\"query_time\":1401,\"service\":\"postgresql::postgres:5432\"}"}},"layer":"POSTGRESQL","service":"postgresql::postgres:5432"}]

But your log is image And this seems to have some syntax problems

qitongfire commented 1 year ago

I have tested local, and the log of fluent-bit is :

[{"date":1678088131.451024,"tags":{"data":[{"key":"LOG_KIND","value":"SLOW_SQL"}]},"body":{"json":{"json":"{\"id\":\"cb9c1a5b-691e-fb2f-457a-9c72a392d9ed\",\"layer\":\"POSTGRESQL\",\"statement\":\"with \"pg_sleep\" as (select * from pg_sleep(1.4)) select * from \"pg_sleep\"\",\"time\":1678088131000,\"query_time\":1401,\"service\":\"postgresql::postgres:5432\"}"}},"layer":"POSTGRESQL","service":"postgresql::postgres:5432"}]

But your log is image And this seems to have some syntax problems

It has nothing to do with what you're saying,this sql with pg_sleep as (select from pg_sleep(1.4)) select from pg_sleep can display in Slow statements

image

qitongfire commented 1 year ago

Please help explain,

I have tested local, and the log of fluent-bit is :

[{"date":1678088131.451024,"tags":{"data":[{"key":"LOG_KIND","value":"SLOW_SQL"}]},"body":{"json":{"json":"{\"id\":\"cb9c1a5b-691e-fb2f-457a-9c72a392d9ed\",\"layer\":\"POSTGRESQL\",\"statement\":\"with \"pg_sleep\" as (select * from pg_sleep(1.4)) select * from \"pg_sleep\"\",\"time\":1678088131000,\"query_time\":1401,\"service\":\"postgresql::postgres:5432\"}"}},"layer":"POSTGRESQL","service":"postgresql::postgres:5432"}]

But your log is image And this seems to have some syntax problems

It has nothing to do with what you're saying,this sql with pg_sleep as (select from pg_sleep(1.4)) select from pg_sleep can display in Slow statements

image

Please help explain,Thanks