bluesky / bluesky-kafka

Kafka integration for bluesky
Other
5 stars 10 forks source link

When using DEBUG log level, the provided Kafka config is logged in its entirety which can leak secrets #61

Open NoxHarmonium opened 1 week ago

NoxHarmonium commented 1 week ago

Hi,

We are using bluesky-kafka in a service with a secure Kafka instance, so we provide some secrets via the consumer/producer config (e.g. sasl.password).

We just noticed that when we set the service's log level to DEBUG, that bluesky-kafka is logging the entire config and leaking the password into our logs.

This is preventing us from using debug logs at the moment.

I can think of two possible solutions:

  1. We just remove the relevant log calls and make the consumer of the library responsible for logging the config that they're passing in if they want to.
  2. We could redact sensitive config variables. I could go through the confluent kafka documentation (e.g. https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#) and find any sensitive keys and write a function to redact them before logging.
  3. We just log important/interesting keys from the config

I would recommend solution 1 because its simple and I think its easy enough for consumers of the library to log the config themselves. Solution 2 does have the risk that the config schema changes in the future with sensitive keys renamed/added that aren't covered by the redactor and we would need to maintain it.

If you're happy with one of those solutions (or have another suggestion) I'd be happy to raise a PR!

Thanks!

Relevant lines: https://github.com/bluesky/bluesky-kafka/blob/c85d98da7ed6b36a2bb482fb913a163ef3000f14/bluesky_kafka/consume.py#L121 https://github.com/bluesky/bluesky-kafka/blob/c85d98da7ed6b36a2bb482fb913a163ef3000f14/bluesky_kafka/produce.py#L122

NoxHarmonium commented 1 week ago

I was thinking something like this: https://github.com/AustralianSynchrotron/bluesky-kafka/commit/bb607e8b0772e8c08f2f2042382391bb6c8c73f6

mrakitin commented 4 days ago

Thanks for the suggestions, @NoxHarmonium! While solution (1) is straightforward, users may want to see the configuration of the servers, etc. in the logs. I share your concerns in the proposed solution (2) regarding redacting the config appearing in the logs, but we could add enough tests to capture the keys if they are not in a "white list" of keys we define. Solution (3) seems a bit safer if we forget to update the list.

Why not proceeding with (3)?

NoxHarmonium commented 4 days ago

Sure, sounds good. I should be able to get a PR up for that by the end of the week for review.