Add better formatting protections for non-Jason friendly metadata

andykent commented 4 years ago

This PR adds a new JasonSafeFormatter module which is able to recursively walk metadata and make coerce it into Jason.encode! safe.

I can confirm that this fixes #40 as we have experienced this in production and a few other Logger crashes related to dodgy data ending up in metadata.

We have made some opinionated calls on how to format the data when not compatible but hopefully this is an improvement on the current situation.

The opinionated bits are likely...

we coerce Structs to Maps if they implement Jason.Encoder
we attempt to coerce Keyword lists into Maps
we convert Tuples to lists
we inspect functions rather than attempting to call them

Credit where credit is due, I used a few commits from this coingaming fork for some pointers when putting this together.

coveralls commented 4 years ago

Coverage increased (+2.07%) to 76.19% when pulling 1fa3853fbbe047a3c2db7123f2e4a6bc9479cc82 on portal-labs:jason-safe-formatter into a0a8151fbf8e904ace9ca1c690b20f04ead3cade on Nebo15:master.

w1mvy commented 4 years ago

👍 👍 👍
A similar crash #40 has occurred.

AndrewDryga commented 4 years ago

@andykent thank you for this awesome work and sorry for taking so long to review. This is a crucial PR for LoggerJSON. I left a couple of comments on how we can make it better and then LGTM. ❤️

andykent commented 4 years ago

Thanks for taking the time to review @AndrewDryga ❤️

I think I addressed both your points. The only bit I am still unsure about is the handling of Keyword lists.

The PR currently converts Keyword lists to Maps but I'm not 100% convinced that this is the best option...

Pros: Means that Keyword lists show up nicely in Google Logging UI and can be used to search and set metrics on. e.g. data.kw_list.key="value"

Cons: Semantics are different so order is not preserved and duplicate keys are dropped. Might be confusing as datatype is different in logs from code. We have to attempt to process things that look like keyword lists and bail with rescue when they are not.

The options...

Convert Keyword lists to Maps - as currently in this PR.
Just call inspect on them - clearest in logs but loses all searching/metrics semantics.
Convert them to Lists of Lists or Lists of Maps - this is a compromise between options 1 and 2, issue is that searching then becomes positional, e.g. data.kw_list[0].key="value" || data.kw_list[1].key="value" feels kind of horrid.
Make it user configurable somehow.

I don't honestly have a strong opinion here as none are ideal so it's probably just a case of choosing the least bad option. Unless anyone has any other recommendations for how to handle this?

Thanks again for the review, excited to see this merged soon so we can get off of our fork.

AndrewDryga commented 4 years ago

@andykent I think being able to pass KV lists and filter by them in logging aggregator is a good reason to drop duplicate keys. This is all about metadata sent by the application so if a developer needs to have them in another form, it's easy to structure metadata differently in that specific place.

andykent commented 4 years ago

@AndrewDryga Yep, agreed, this is the original reason I went this route. OK, think the PR is good then. Will be testing it out in production this week and can report back.

AndrewDryga commented 4 years ago

@andykent this is awesome that you can test if first, tell me if everything is good and I'll merge this PR and release a new version. Thank you again ❤️

Nebo15 / logger_json

Add better formatting protections for non-Jason friendly metadata #46