Open Fleshgrinder opened 6 months ago
Do you have examples of this being done in other serialization libraries? As far as I know, filtering logs was always the responsibility of a corresponding logging library, as usually, it knows better what data gets where. It is hard for serialization library to decide which information is sensitive and which is not. Should Json input be printed? Index and 'missing {'? Name of class and mandatory property that is not filled? I doubt it is possible to decide without understanding the initial task. If we assume that all data is sensitive, then the straightforward solution would be to eradicate exception messages at all, leaving only generic class names. This is also easier to do in the exception handler.
https://github.com/FasterXML/jackson-core/issues/991
com.fasterxml.jackson.core.io.JsonEOFException: Unexpected end-of-input: expected close marker for Object (start marker at [Source: REDACTED (
StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION
disabled); line: 1, column: 1])
Jackson does and that's exactly how it should be in Kotlinx as well.
It would already help if Kotlinx would have documented the message patterns so that we can actually write (expensive) regular expression matchers in the logger configurations to filter them out. At the moment I have to go through the source code and find all possible locations that might print JSON. A very tedious process …
Thanks for the info. I see that Moshi and Gson do not have this feature, though. We can consider an approach similar to Jackson's: to hide Json input by default. Note that other information, like class names, will still be available.
It's only about data. Everything else can and should stay as it is. We're talking here about sensitive user data and not exposing it to engineers. Source code symbols like class names are anyway known to engineers, and they require them for efficient debugging.
I think the argument for redundancy also has merit. Sometimes, verbatim exception messages can even leak to untrusted clients due to some misconfiguration.
If we could configure the exceptions to redact values of certain sensitive fields, the impact of such exception leaks is significantly lessened.
The primary responsibility lies on the logging system / response handler. But Why not have a redundant measure in place?
There are multiple places within the library where the data is included in exception messages. Not only might this data be dangerous in certain situations, it might also contain sensitive information that should never land in logging systems. Many companies need to invest a lot of time into cleaning these things, but it's obviously always a lossy process.
It would be great to establish a policy in which the actual data isn't being included anymore in exception messages.
I do understand that this limits the information available for debugging, hence, an option could be offered to enable this functionality again for those users who need it. But, the default should be to assume that the data is sensitive and unsafe by default.