Closed Mike-Robert-CHC closed 2 years ago
Hm, what's the difference w.r.t. security here. The caller has access to both, presumably. Putting it in the exception seems a little more helpful and immediate. (Though sticking a whole record in an exception message could be messy.)
More people have access to our logs than the raw data that is processed by the application. Saving log4j messages is configurable by a log4j.config, but exceptions are more difficult -- partly because this presents as a SparkException (with details in the inner exception) and partly because the exception is thrown lazily by the first action (and that action is disjointed from the Spark XML configuration).
I am happy for other suggestions. The smallest change I could think of was moving abbreviatedRecord to a log. For the tightest security, I considered removing abbreviatedRecord, but considered others may use it for debugging.
I get it. This is probably OK. I'll comment on the PR
When Spark XML detects malformed input data, sensitive information contained in the record can be leaked into logs by the exception message in StaxXmlParser.scala lines 106-107:
May we change this to the following?