bluesky / bluesky-kafka

Kafka integration for bluesky
Other
5 stars 10 forks source link

When using DEBUG log level, the provided Kafka config is logged in its entirety which can leak secrets #61

Open NoxHarmonium opened 1 day ago

NoxHarmonium commented 1 day ago

Hi,

We are using bluesky-kafka in a service with a secure Kafka instance, so we provide some secrets via the consumer/producer config (e.g. sasl.password).

We just noticed that when we set the service's log level to DEBUG, that bluesky-kafka is logging the entire config and leaking the password into our logs.

This is preventing us from using debug logs at the moment.

I can think of two possible solutions:

  1. We just remove the relevant log calls and make the consumer of the library responsible for logging the config that they're passing in if they want to.
  2. We could redact sensitive config variables. I could go through the confluent kafka documentation (e.g. https://docs.confluent.io/platform/current/installation/configuration/consumer-configs.html#) and find any sensitive keys and write a function to redact them before logging.
  3. We just log important/interesting keys from the config

I would recommend solution 1 because its simple and I think its easy enough for consumers of the library to log the config themselves. Solution 2 does have the risk that the config schema changes in the future with sensitive keys renamed/added that aren't covered by the redactor and we would need to maintain it.

If you're happy with one of those solutions (or have another suggestion) I'd be happy to raise a PR!

Thanks!

Relevant lines: https://github.com/bluesky/bluesky-kafka/blob/c85d98da7ed6b36a2bb482fb913a163ef3000f14/bluesky_kafka/consume.py#L121 https://github.com/bluesky/bluesky-kafka/blob/c85d98da7ed6b36a2bb482fb913a163ef3000f14/bluesky_kafka/produce.py#L122

NoxHarmonium commented 1 day ago

I was thinking something like this: https://github.com/AustralianSynchrotron/bluesky-kafka/commit/bb607e8b0772e8c08f2f2042382391bb6c8c73f6