chime / terraform-aws-alternat

High availability implementation of AWS NAT instances.
MIT License
1.03k stars 63 forks source link

Suggestion: Add Structured logging #76

Closed therealvio closed 9 months ago

therealvio commented 11 months ago

Hi team!

We are looking to enhance the log outputs of alterNAT Lambdas in the form of structured logging. This would tremendously improve the searchability and analysis of logs when errors surface.

For those not privy to structured logging, check out this article.

Would the maintainers of this repo be open to a PR that featured structured logging? We'd be happy to implement it!

Implementation questions:

I have read through the Contributing, and Code of Conduct documentation for the repo, though I am also open to suggestions, or guidelines you may have that hasn't yet been covered in the documentation.

Thanks!

bwhaley commented 11 months ago

Great suggestion, makes sense to me. structlog seems like a good approach.

Would you prefer this as something that users opt in/opt out of? Or would you prefer to implement structured logging as the only logging solution?

Let's use this as the only logging solution. We'll cut a new minor version release and mark it as backward incompatible.

Do you have any preference on what library/libraries that would be used for this? I was looking to implement structlog (https://www.structlog.org/en/stable/ & https://github.com/hynek/structlog) as a potential library. It has a fair amount of stars, and is in active development.

Structlog seems widely used/accepted so let's go with it. I asked around and nobody else had a strong preference.

therealvio commented 11 months ago

I neglected to explain the why, so I will add here:

The intention is to produce clear signals on errors and success messages on connection reporting. If users setup SLOs around the error messages rather than downstream events (like how long a NAT gateway was used), it would make SLO reporting more accurate.

bwhaley commented 11 months ago

Makes sense. I could see a number of use cases, from connection success monitoring to failover event monitoring.

bwhaley commented 9 months ago

Closing this since it hasn't been active in a while. Do feel free to submit a PR if you'd like to see this enhancement!