JSON substrings are not sanitized

mcfilib commented 6 years ago

Report

Summary

I'm testing out redacting certain fields from Sentry. It's not clear to me if this is an issue with the client lib, an issue with the platform or an issue with my understanding of how this should work.

Any help would be most appreciated!

Reproduction Steps

Set up a Rails app with Sentry.
Add a field to sanitize_fields e.g. "email".
Add a call to manually raise an exception in one of your controller actions to see how sanitised fields show up in Sentry's web UI e.g. raise "{\"id\":1,\"email\":\"john@doe.ac.uk\"}"

Observed Behaviour

The email field is sanitised at the top of the UI and underneath the Exception UI.
The email field is not sanitised in the Message section of the UI.

Expectated Behaviour

The Message section should also be redacted

Screenshot(s)

UI Top

UI Message

UI Exception

mcfilib commented 6 years ago

In the end I went with the most simplistic solution i.e. redacting the message from the logs completely via a custom processor.

class RedactMessage < Raven::Processor
  def process(data)
    exc, _ = data[:message].split(":")
    data[:message] = exc
    data[:logentry][:message] = exc
    data
  end
end

Though it feels like a bit of a shame that, since SanitizeData is already checking whether the field can be encoded as JSON and redacting specific fields, this isn't work it is already doing.

What do you think? Would you accept a pull request adding this behaviour to SanitizeData?

mcfilib commented 6 years ago

Reply from support:

Hi Filib,

Apologies for the wait here. I spoke with our team and it sounds like yes the expectation is that the Exception message field would be scrubbed, and this is a bug. We have some larger architectural changes coming up to better handle data scrubbing across the board, but for now it sounds like a manual fix in the sdk is the way to go. If you're still up for contributing, the team said they'd accept a PR for this sort of change. Hopefully you can get it reviewed by someone on the ruby sdk, but if you have any trouble getting a response, feel free to ping us again.

Best, Megan

jackkinsella commented 6 years ago

I've also encountered this bug. I had an ActionView::Template::Error that contained an ExecJS error, and, to my surprise, all the sanitized params were sent to sentry's servers. Luckily it was with a non-production build.

This issue ought to be elevated to critical bug as a warning to others - it's a big security hole (not to mention, a potential thorn in the sides for people dealing with GDPR compliance).

jackkinsella commented 6 years ago

@mcfilib

The solution you outlined sounds like a useful stop-gap measure until this bug is resolved by the sentry team. I needed to tweak your code a bit in order to remove all copies of the exception message:

class RemoveErrorMessage < Raven::Processor
  def process(data)
    exc, _ = data[:message].split(":")
    data[:message] = exc
    data[:logentry][:message] = exc
    data.dig(:exception, :values).each do |exception_value|
      exception_value[:value] = "REMOVED"
    end
    data
    rescue StandardError # prevent infinite error recursion if hash key structure changes
      data
  end
end

Also, for anyone else reading, the way you integrate this code into your system is with the following:

Raven.configure do |config|
  config.processors += [RemoveErrorMessage]
  ...
end

nateberkopec commented 6 years ago

The problem is that JSON cannot be sanitized if it's a substring. This surfaced in Exception messages because they usually look like "SomeClassName: other-stuff-which-can-be-anything".

Currently we just bail on any string which cannot be, in its entirety, JSON object or array.. Obviously, checking all possible substrings of any given string to see if they are valid JSON is not a good solution to this problem.

Suggestions? I'm not sure how to fix this one.

dcramer commented 6 years ago

looping in @mitsuhiko -- we did some work internally and did/are building a plan around what SDKs should do for opt-in/opt-out behavior

That said, I dont think we should be trying to parse messages and looking for PII. People should simply stop recording PII in them as it's a similar effect as putting into logs.

Lastly, the server and an upcoming project will enforce PII without us implementing it in every SDK in every dimension ever, which makes these goals much more realistic.

nfm commented 5 years ago

I've run into a similar, although not exactly the same, issue.

A client submitted malformed JSON to our account creation endpoint, including a password param. Rails barfed on this because it was malformed, and raised ActionDispatch::Http::Parameters::ParseError (which was raised as a result of JSON::ParserError from JSON.parse(invalid_json)). The JSON::ParserError exception message includes the string that fails to parse. Something like this: 751: unexpected token at #{invalid_json_string}.

It looks like Processor::SanitizeData as of https://github.com/getsentry/raven-ruby/commit/28ccdeadac80378b394c372c37857462136b3573 will strip out values whose keys are blacklisted, and string values that match a regexp of blacklisted fields as long as the value can be parsed as JSON. This obviously doesn't catch the edge-case I've hit where the JSON was malformed for some reason.

@dcramer is this kind of thing on your radar in terms of the project you've referenced above about improving masking at the server? I think it would be feasible to mask out "password: ...", "password=...", etc. substrings for a finite set of expected matches. For this kind of edge-case, I'd be ok to let the data get through to the server and be handled there (rather than rolling my own processor client-side). I agree that it probably doesn't make sense to ship an implementation for this kind of thing in every SDK.

nateberkopec commented 5 years ago

I think it would be feasible to mask out "password: ...", "password=...", etc. substrings for a finite set of expected matches.

that's definitely more feasible, even in the client.

st0012 commented 3 years ago

because the new SDK (sentry-ruby) has removed data sanitization functionality with a new data scrubbing strategy (like other languages' SDKs), we will close this issue for now.

getsentry / sentry-ruby