getsentry / sentry-ruby

Sentry SDK for Ruby
https://sentry.io/for/ruby
MIT License
932 stars 492 forks source link

Add better documentation for sanitizing data #1140

Closed FriedSock closed 3 years ago

FriedSock commented 3 years ago

Describe the bug

sanitize_fields has been removed, but there is no description of how to use the data scrubbing features https://docs.sentry.io/platforms/ruby/data-management/sensitive-data/#scrubbing-data and https://docs.sentry.io/platforms/ruby/configuration/filtering/

Sentry.init do |config|
  config.before_send = lambda do |event, hint|
    event[:key] = value
    event
  end
end

It would be good if this was a concrete example showing how to replicate the sanitize_fields functionality, it is unclear what value is here and how to access request parameters from the event.

seand7565 commented 3 years ago

👍 on this. I'm attempting to migrate from sentry-raven and this is the most confusing part. I'm not sure how to use Rails.application.config.filter_parameters to filter out sensitive data with sentry_ruby. It used to be we could just do this: config.sanitize_fields = Rails.application.config.filter_parameters.map(&:to_s) but I'm not sure how that translates to sentry-ruby now. 🤔

thib44 commented 3 years ago

I had the same issue, for me the best solution is to do something like :

config.before_send = lambda do |event, _hint|
  Rails.application.config.filter_parameters.each do |filter_parameter|
    event[filter_parameter] = nil
  end
  event
end
st0012 commented 3 years ago

sorry for the late reply, here's an example:

  config.before_send = lambda do |event, _hint|
    # note: if you have config.async configured, the event here will be a Hash instead of an Event object
    request_data = event.request.data
    mask = "*****".freeze

    Rails.application.config.filter_parameters.each do |filter_parameter|
      if sensitive_data = request_data[filter_parameter]
        request_data[filter_parameter] = mask
      end
    end

    event.request.data = request_data
    event
  end

if you want to replicate how sanitize_fields work, you can mimic the behavior of the processor in the old SDK. you can also borrow ideas from other processors

we also recommend you to use server-side data scrubbing whenever possible.

mcclymont commented 3 years ago

Why was this feature dropped? This makes it very difficult to upgrade.

The functionality in sanitizedata.rb is not exactly trivial and something to be copy-pasted into before_send. I'd rather rely on the gem and their own tests rather than having to re-implement this feature along with tests

HazAT commented 3 years ago

@mcclymont This decision was made already some time ago when we moved SDKs to be unified. We understand that there might be some edge cases where you really want to sanitize in the SDK but for the majority of our users, it's enough to have server-side data scrubbing (see https://docs.sentry.io/product/data-management-settings/). Maintaining all this code in all different SDKs wasn't feasible and also it caused some bugs. We will try to provide better docs to make the migration easier but this feature, will not come back to SDKs the way it was.

mcclymont commented 3 years ago

Client-side sanitisation is much preferred over server-side scrubbing for obvious security reasons. If we don't send sensitive data to Sentry, we don't need to trust Sentry to scrub it properly.

The documentation even says this

Using before-send in the SDKs to scrub any data before it is sent is the recommended scrubbing approach, so sensitive data never leaves the local environment.

So I don't believe sanitization in the SDK is an "edge case"

mcclymont commented 3 years ago

sorry for the late reply, here's an example:

@st0012 Unfortunately this code sample does not work nearly as well as the old functionality in sentry-raven

  1. The sanitize_fields previously worked with regexes matching key names
  2. sanitize_fields would clean all nested fields, even in nested JSON, query params and other places by traversing Hashes recursively.
st0012 commented 3 years ago

@mcclymont if you're using Rails, I think something like this should match the old behavior the best.

  filter = ActiveSupport::ParameterFilter.new(Rails.application.config.filter_parameters)
  config.before_send = lambda do |event, hint|
    event.request.data = filter.filter(event.request.data)
    event
  end

(I'm still improving this example though)

mcclymont commented 3 years ago

Thanks @st0012, that's a good idea.

Unfortunately, it still doesn't measure up to the functionality of the sentry-raven processor:

  1. It doesn't filter out sensitive query parameters
  2. It doesn't filter out parameters in request.body if they are encoded with JSON since the value of event.request.data is a string, not a hash
st0012 commented 3 years ago

@mcclymont as we've stated before, we don't intend to provide either a feature nor a code snippet that matches the exact same behavior of sentry-raven. the one-fits-all approach sentry-raven took just didn't work and will not work (you can see some of the issues it caused here). so we won't try to do that with our examples either.

we'll improve our examples overtime to reduce the migration cost, but you should be the one to adjust them to fit your own use cases 🙂

bbugh commented 3 years ago

@st0012 that's not particularly helpful, and I don't think what people are asking for. It's not really a "migration guide" if you don't actually provide a way to migrate. It's not like this is casual data we're sending, ensuring that PII is covered is a legal and privacy requirement. I don't have time to study all of the stuff Sentry was doing under the hood to sanitize in order to replicate the behavior and then possibly still get it wrong and get into legal trouble. This is really disappointing position to take.

st0012 commented 3 years ago

@bbugh the new SDK also provides an option to not sending any PII at all

config.send_default_pii = false # this is actually the default

just use it and that'll save you from the trouble 🙂

if you want part of the PII but don't want the rest of the parts, you'll need to scrub them yourself. because we don't know what you want and what you don't want. guessing a general pattern of sensitive data means it'll always accidentally scrub something else and it's a never-ending work, which eventually will become slow and buggy.

mrexox commented 3 years ago

Hello, everyone!

I've made a try to add missing sanitization feature (in memory of sentry-raven) through gem sentry-sanitizer: https://github.com/mrexox/sentry-sanitizer

You can try it and check, if it fits your requirements. For my project it did fit. I hope, this would help.

Note: works only with sentry-ruby ~> 4.2.0

st0012 commented 3 years ago

Note: works only with sentry-raven ~> 4.2.0

@mrexox I guess you mean sentry-ruby here? 😉

also thanks for building the new SDK's first community plugin 🎉

mrexox commented 3 years ago

@st0012 yep, thank you, fixed it!

You're welcome! Hope, it is useful :)

francisco-rojas commented 3 years ago

@mcclymont if you're using Rails, I think something like this should match the old behavior the best.

  filter = ActiveSupport::ParameterFilter.new(Rails.application.config.filter_parameters)
  config.before_send = lambda do |event, hint|
    event.request.data = filter.filter(event.request.data)
    event
  end

(I'm still improving this example though)

I ended up doing this with async config:

  filter = ActiveSupport::ParameterFilter.new(Rails.application.config.filter_parameters)
  config.before_send = lambda do |event, _hint|
    event['request']['headers'] = filter.filter(event.dig('request', 'headers')) if event.dig('request', 'headers')
    event['request']['data'] = filter.filter(event.dig('request', 'data')) if event.dig('request', 'data')

    event
  end

I never saw a "data" key under event['request'] but I kept it just in case.

I also added config.filter_parameters += ['Authorization'] to application.rb in rails to filter out the Authorization token. Even though it is not in the list of Rails.application.config.filter_parameters it seems that sentry just filters it out in their UI, but I didnt want it to travel in the payload so I filter it before sending it to sentry.

Vita1ik commented 3 years ago

Me as well! : I never saw a "data" key under event['request'].

It have to work with Rails < 6.0

config.before_send = lambda do |event, _hint|
  mask = '*****'.freeze

  if event.dig('request', 'data')
      Rails.application.config.filter_parameters.each do |filter_parameter|
         event['request']['data'][filter_parameter.to_s] &&= mask
      end
  end

  event
end
anderscarling commented 3 years ago

A part of this issue I think is not really discussed enough (but mentioned by @mcclymont earlier in the tread) is that query params is not filtered by config.send_default_pii and not (as far as I've seen) mentioned in the documentation.

Obviously they are somewhat likely to contain pii or other sensitive info, especially stuff like password reset tokens (which if unique to a single user could be pii under gdpr) often go there. Obviously the same could be said for the rest of the request url as well, but I think query params is the most likely offended here.

This does imo make the naming of send_default_pii a bit unfortunate as it sort of promises a bit more than what it can possibly deliver, especially without doing more complicated filtering.

st0012 commented 3 years ago

@anderscarling thanks for bringing up the reset token example. based on such cases, I think it's a bug that send_default_pii doesn't remove query params. I will exclude them in a recent release.

Related PR: https://github.com/getsentry/sentry-ruby/pull/1302

st0012 commented 3 years ago

Update: I think this should work better (and is simpler) than focus on filtering request data

  filter = ActiveSupport::ParameterFilter.new(Rails.application.config.filter_parameters)

  config.before_send = lambda do |event, hint|
    filter.filter(event.to_hash)
  end
franzliedke commented 3 years ago

@st0012 before_send should return the filtered event, correct?

EDIT: I am stupid, this does. :facepalm:

st0012 commented 3 years ago

I've updated the documentation several times for better sanitization guidance. And I think using Rails' parameter filter generally works well, so I'm closing this for now. But feel free to drop new comments or new issues if you think we can improve it further, thanks 🙂