darold / pgbadger

A fast PostgreSQL Log Analyzer
http://pgbadger.darold.net/
PostgreSQL License
3.49k stars 349 forks source link

CloudNativePG log format parsing #760

Closed cod-r closed 1 year ago

cod-r commented 1 year ago

Hi,

We are running postgres in kubernetes via CloudNativePG operator. The logs have this format and pgbadger doesn't detect it:

{
    "level": "info",
    "ts": 1669213642.9619803,
    "logger": "postgres",
    "msg": "record",
    "logging_pod": "mydb-pg-cluster-1",
    "record": {
        "log_time": "2022-11-23 14:27:22.961 UTC",
        "user_name": "mydb",
        "database_name": "mydb",
        "process_id": "11282",
        "connection_from": "10.42.8.57:34832",
        "session_id": "637e27c8.2c12",
        "session_line_num": "8534",
        "command_tag": "BIND",
        "session_start_time": "2022-11-23 14:01:44 UTC",
        "virtual_transaction_id": "3/3493",
        "transaction_id": "0",
        "error_severity": "LOG",
        "sql_state_code": "00000",
        "message": "duration: 0.081 ms  bind S_1385/C_3919: select mytable0_.id as id1_5_0_, mytable0_.created_by as created_2_5_0_, mytable0_.created_date as created_3_5_0_, mytable0_.updated_by as updated_4_5_0_, mytable0_.updated_date as updated_5_5_0_, mytable0_.version as version6_5_0_, mytable0_.can_book as can_book7_5_0_, mytable0_.name as name8_5_0_, mytable0_.send_email as send_ema9_5_0_ from mytable mytable0_ where mytable0_.id=$1",
        "detail": "parameters: $1 = 'a-parameter'",
        "application_name": "PostgreSQL JDBC Driver",
        "backend_type": "client backend",
        "query_id": "0"
    }
}

The above is a single log line.

Commands I tried:

pgbadger -f json mylogfile
pgbadger -f jsonlog mylogfile

pgBadger version 11.7

Can pgBadger suport this log format?

If not can you give me some tips on formatting it to be compatible with pgbadger? I'm thinking about formatting it using jq before passing the logs to pgBadger.

Thank you!

darold commented 1 year ago

Same answer as #757, a good idea would be to sponsor pgbadger which could help CloudNativePG users to use it with their log. In other term, yes pgBadger can support that but this is at a very low priority in my todo list.

cod-r commented 1 year ago

Thanks for the quick reply.

Since CloudNativePG is an independent open source project and community-driven, it doesn't endorse any company. The project will soon be part of Cloud Native Computing Foundation (CNCF).

I'm in no way affiliated with the project, I'm just using it because of the above reasons, I feel safer using community-driven projects. Right now I'm migrating from Zalando Postgres operator and faced the opened issue.

darold commented 1 year ago

The point is that pgBadger fully supports the PostgreSQL community jsonlog format and it seams that some projects, community-driven or not, have chosen to have their own jsonlog format. I can understand that but If those projects have an interest in allowing pgbadger to parse the log of their users, they should think about contributing to pgbadger project. Contributing can be in the form of pull request or by sponsoring the development so that I will not take on my spare time.

I understand that you will loved to see pgbadger parsing your log and I have nothing against adding CloudNativePG log parsing into pgbadger. This is the reason why the issue is kept open and flagged as feature request. On my side, I will implement this and contribute to the project only if I have to parse such log for my work or I'm paid for that, which is not the case yet.

gbartolini commented 1 year ago

@cod-r I think the issue here is that neither CloudNativePG nor pgBadger have to change IMHO. This is a log management/transformation process that is missing between the CloudNativePG and pgBadger components in your Kubernetes environment, and that is doable using cloud native log management tools like fluentd and, maybe, a container image for pgBadger. As I suggest in the CloudNativePG thread, integration between applications is a key principle in Kubernetes, and I strongly advice you to work in that direction, without requiring a change in the existing apps.

p.s: hi @darold !

darold commented 1 year ago

Hi, I agree, it look like the record { ... } json part is the jsonlog reported by PG (except the multiline) so if a script or an application can generate the right output, pgbadger will natively support it. Greeting @gbartolini :-)

phisco commented 1 year ago

Hi, I agree, it look like the record { ... } json part is the jsonlog reported by PG (except the multiline)

Hi @darold, could you expand on the "except the multiline" bit?

phisco commented 1 year ago

And to add to the above, another main difference is that we also parse pgaudit logs and output them in json format, which I don't think nor pgaudit, nor postgres, support, for the time being at least.

darold commented 1 year ago

Hi @phisco, the json record must be in a single line for each log entry, for the example above: {"log_time": "2022-11-23 14:27:22.961 UTC","user_name": "mydb","database_name": "mydb","process_id": "11282","connection_from": "10.42.8.57:34832","session_id": "637e27c8.2c12","session_line_num": "8534","command_tag": "BIND","session_start_time": "2022-11-23 14:01:44 UTC","virtual_transaction_id": "3/3493","transaction_id": "0","error_severity": "LOG","sql_state_code": "00000","message": "duration: 0.081 ms bind S_1385/C_3919: select mytable0_.id as id1_5_0_, mytable0_.created_by as created_2_5_0_, mytable0_.created_date as created_3_5_0_, mytable0_.updated_by as updated_4_5_0_, mytable0_.updated_date as updated_5_5_0_, mytable0_.version as version6_5_0_, mytable0_.can_book as can_book7_5_0_, mytable0_.name as name8_5_0_, mytable0_.send_email as send_ema9_5_0_ from mytable mytable0_ where mytable0_.id=$1","detail": "parameters: $1 = 'a-parameter'","application_name": "PostgreSQL JDBC Driver","backend_type": "client backend","query_id": "0"} about pgaudit there is also issue #705 marked as feature request.