gocardless / pgreplay-go

Postgres load testing tool
MIT License
112 stars 15 forks source link

Parsing csv log. #9

Open jalexandre0 opened 3 years ago

jalexandre0 commented 3 years ago

Hi there.

Theres a way to parse and replay from csv logs for people who can't change their log_line_prefix (aka RDS customers).

mvasilenko commented 3 years ago

hit this issue too @jalexandre0 did you managed to replay RDS logs by pgreplay-go?

so far, I was able to convert RDS csvlog to pgreplay-go compatible log by writing a simple parser, which is converting one csvlog into two lines like this:

csvlog:

2021-09-09 17:00:00.006 UTC,"user","database",27752,"172.30.1.2:34106",613a286d.6c68,13992,
"SELECT",2021-09-09 15:29:49 UTC,229/3866470,0,LOG,00000,
"execute <unnamed>: SELECT ""jobs"".* FROM ""jobs"" WHERE ""jobs"".""deleted_at"" IS NULL
AND ""jobs"".""user_id"" = $1","parameters: $1 = '124765'",,,,,,,,"bin/rails"

pgreplay-go log

2021-09-09 17:00:00.006 UTC |user|database|613a286d.6c68|LOG:  execute <unnamed>: SELECT "jobs".* FROM "jobs" WHERE "jobs"."deleted_at" IS NULL AND "jobs"."user_id" = $1
2021-09-09 17:00:00.006 UTC |user|database|613a286d.6c68|DETAIL:  parameters: $1 = '124765'

@lawrencejones does such an approach looks valid to you or am I missing something?

benwh commented 2 years ago

Hey @jalexandre0 - Unfortunately we don't have any native support for CSV logs at the moment, correct.

In theory adding another parser to do this would be pretty straight-forward, given that the logs contain all of the required data, in a roughly similar format. Contributions are welcome!

In the meantime, you could possibly craft some sed/awk commands to meld the log into the correct format. It's also worth noting that pgreplay-go has a (less documented) JSON input format. You can get a feel for what this format looks like by converting an errlog-format log into this: pgreplay-go --debug filter --errlog-input=errlog.log --output=logs.json

ghunti commented 2 years ago

I've created another simple parser that converts an RDS CSV into a pgreplay-go JSON file. You can specify the number of lines and filter them out for a specific use. I've used this to parse CSVs with 10-60GB, and although it was not improved for speed, it could parse the bigger files within 30/45min. https://gist.github.com/ghunti/66f32304b7d7ed9f9b5714ba41a725dc