Sensitive Data in Exception Messages (Anonymization, GDPR, PII, …)

Kotlin / kotlinx.serialization

Kotlin multiplatform / multi-format serialization

Apache License 2.0

5.32k stars 619 forks source link

Sensitive Data in Exception Messages (Anonymization, GDPR, PII, …) #2590

Open Fleshgrinder opened 6 months ago

Fleshgrinder commented 6 months ago

There are multiple places within the library where the data is included in exception messages. Not only might this data be dangerous in certain situations, it might also contain sensitive information that should never land in logging systems. Many companies need to invest a lot of time into cleaning these things, but it's obviously always a lossy process.

It would be great to establish a policy in which the actual data isn't being included anymore in exception messages.

I do understand that this limits the information available for debugging, hence, an option could be offered to enable this functionality again for those users who need it. But, the default should be to assume that the data is sensitive and unsafe by default.

sandwwraith commented 6 months ago

Do you have examples of this being done in other serialization libraries? As far as I know, filtering logs was always the responsibility of a corresponding logging library, as usually, it knows better what data gets where. It is hard for serialization library to decide which information is sensitive and which is not. Should Json input be printed? Index and 'missing {'? Name of class and mandatory property that is not filled? I doubt it is possible to decide without understanding the initial task. If we assume that all data is sensitive, then the straightforward solution would be to eradicate exception messages at all, leaving only generic class names. This is also easier to do in the exception handler.

Fleshgrinder commented 6 months ago

https://github.com/FasterXML/jackson-core/issues/991

com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: REDACTED (StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION disabled); line: 1, column: 1])

Jackson does and that's exactly how it should be in Kotlinx as well.

It would already help if Kotlinx would have documented the message patterns so that we can actually write (expensive) regular expression matchers in the logger configurations to filter them out. At the moment I have to go through the source code and find all possible locations that might print JSON. A very tedious process …

sandwwraith commented 6 months ago

Thanks for the info. I see that Moshi and Gson do not have this feature, though. We can consider an approach similar to Jackson's: to hide Json input by default. Note that other information, like class names, will still be available.

Fleshgrinder commented 6 months ago

It's only about data. Everything else can and should stay as it is. We're talking here about sensitive user data and not exposing it to engineers. Source code symbols like class names are anyway known to engineers, and they require them for efficient debugging.

okarmazin commented 5 months ago

I think the argument for redundancy also has merit. Sometimes, verbatim exception messages can even leak to untrusted clients due to some misconfiguration.

If we could configure the exceptions to redact values of certain sensitive fields, the impact of such exception leaks is significantly lessened.

The primary responsibility lies on the logging system / response handler. But Why not have a redundant measure in place?