getsentry / sentry-ruby

Sentry SDK for Ruby
https://sentry.io/for/ruby
MIT License
932 stars 492 forks source link

send_default_pii should also remove query params #1301

Closed st0012 closed 3 years ago

sfcgeorge commented 2 years ago

Moved from raven to sentry-ruby and spent ages wondering why query params aren't being sent any more. I did wonder about send_default_pii but the docs just say:

When its value is false (default), sensitive information like

  • user ip
  • user cookie
  • request body

will not be sent to Sentry.

No mention of query. I'm used to POST body being considered PII as the docs say but I'm not used to GET params being considered PII so could you please make that clear in the docs.

I'm also not sure how to turn it back on now without turning everything on. I don't want cookies or POST bodies because those are very sensitive, but I do want GET params. The send_default_pii is a bit brute force, it would be more helpful if there were separate configs like send_default_pii_query.

st0012 commented 2 years ago

@sfcgeorge if you google "pii query string", you can find several cases that pii is leaked from the query string (example). so I think it's reasonable to consider it as pii sensitive. and because the SDK doesn't know if our users would put pii in their query string (they probably can't even be 100% sure either), we an only assume all query strings considering pii and not sending them.

@rhcarvalho do you think it's reasonable to replace the send_default_pii option with more fine-grained options?

I'll update the doc to mention it though, thanks for catching it.

rhcarvalho commented 2 years ago

On first look, I'd avoid the fine-grained options for now if there's no strong justification for them -- configuration comes at a cost.

For a custom data-scrubbing strategy we already offer Before Send or Event Processors (a generic way to update events before they go out of the SDK) in the client side, and controls on the server side (or on a local Relay proxy).

All details in https://docs.sentry.io/platforms/ruby/data-management/sensitive-data/

Perhaps that's new to the folks in this discussion? Is that helpful information?

sfcgeorge commented 2 years ago

I agree that PII can be leaked in this way and completely agree that there need to be paranoid defaults.

But I don't think it should be the only option. This is the Ruby/Rails SDK and Rails devs have a good understanding of REST, CRUD and HTTP verbs and most wouldn't put anything sensitive or non-stateless in GETs. Our company code reviews every PR, we use automated security linters, have independent security audits, and a lawyer written privacy policy.

Reset tokens are a valid exception, but they're short lived and expire once used so not a huge threat. Nonetheless since sentry-ruby already has various integrations and defaults it could strip out Devise and Clearance stuff automatically to be safe. I realise it's like whack-a-mole and you can't cover every library, but as it would be an opt-in choice so I think that's ok.

The name send_default_pii is quite unclear to me too. What does default include, or not include? What does PII stand for? Does personally identifiable information mean just personal stuff like phone numbers that would violate GDPR but wouldn't be an issue in all countries, or for B2B companies, or companies with a comprehensive privacy policy? Or does PII mean security information like passwords, one time tokens, credit cards? Really it's both but the name "personal information" really understates it and doesn't convey the gravity of what would be leaked if you disabled it (or rather turn it on).

The bigger issue I have with this setting is you effectively only offer 2 modes:

Sentry really is close to useless without parameters. How am I supposed to debug anything without IDs? I've got an undefined method on nil error so I'm left guessing whether the database record really is missing, or is our code just broken? I can't look up the record because I don't have the ID. And I can't even replicate because, well, I've got no IDs.

I have seen the scrubbing and filtering docs. The first example in both is filtering out a ZeroDivisionError which is something nobody would ever ask how to do. The params |event, hint| aren't explained so I'll have to go digging with puts and binding.pry to figure out what they are. The other scrubbing examples all say "The platform or SDK you've selected either does not support this functionality, or the code sample is missing from our documentation" and they're about adding extra information not scrubbing it anyway. The filtering docs also don't have any scrubbing examples. So no actual docs on scrubbing despite the page title. Ok fine I'll look in sentry-ruby/examples... nope no examples at all using config.before_send. But surely the sentry-rails/example has it? No! In fact it actually sets send_default_pii to true! Is that what you're recommending? Crazy. The migration guide just says "for scrubbing sensitive data, please use Sentry's Advanced Data Scrubbing feature" which is server side scrubbing but we really don't want to be sending the data in the first place (that guide also needs updating to mention GET params now being included under send_default_pii). The gem doesn't even have rubydoc API documentation. So I'm left trawling through the gem's code to figure out what to do.

Unless I'm missing something (I hope I am) the options you're giving are:

Short term please at least provide a code example of how to write a sensibly strict before_send that scrubs most things. I can find some guidance in the code itself but it's far too brute force. I assume it'll need to:

And then there's the risk that if the SDK's API changes, or the way Rails handles things changes, the custom before_send may silently stop scrubbing sensitive information and we'd never know about it. So I guess we have to write a comprehensive test suite too. Whereas if you maintained a small number of sensible presets then Sentry and the community could keep them up to date and every customer would benefit.

We use Sentry as a hosted solution because historically we've been blown away by your top notch UX, design, and customer support. I hope you can find a better long term solution that gets Sentry back to that standard.

st0012 commented 2 years ago

Hey @sfcgeorge, we really appreciate your feedback on the topic. Here's the short version of my response:

  1. Sorry about the missing documents.
    • I'll provide more info about the arguments in the callbacks such as event and hint in before_send.
    • I'll also add rubydoc to top-level APIs and key components like Sentry::Event.
  2. For the query string case, I think it may be turned on/off separately. But I haven't come up with a way to do it cleanly and without introducing a new top-level configuration option.
  3. We don't plan to provide any scrubbing related logic from the SDK, whether it's a generic preset or a library-specific one.

And here's a more detailed response:

Reset tokens are a valid exception, but they're short lived and expire once used so not a huge threat. Nonetheless since sentry-ruby already has various integrations and defaults it could strip out Devise and Clearance stuff automatically to be safe. I realise it's like whack-a-mole and you can't cover every library, but as it would be an opt-in choice so I think that's ok.

Scrubbing data for libraries like Devise in the SDK isn't a maintainable way to deal with these issues. But if someone is willing to make gems like sentry-devise, we'll be happy to help :-)

There's a community-built data scrubbing extenion: https://github.com/mrexox/sentry-sanitizer that may be an example for this.

The name send_default_pii is quite unclear to me too. What does default include, or not include? What does PII stand for?

I don't get the point. What we should call it otherwise? Of course it'll have a generic name with more information in the document. (But thanks again for catching the missing info).

The first example in both is filtering out a ZeroDivisionError which is something nobody would ever ask how to do.

You can substitude it with any exception class you want to filter out. It's never to be a thing that you can paste into your code and just work. How are we suppose to know about the use case you have?

The params |event, hint| aren't explained so I'll have to go digging with puts and binding.pry to figure out what they are.

I'll add more information about the event and hint parameters.

that guide also needs updating to mention GET params now being included under send_default_pii

I'll update it, thanks.

The gem doesn't even have rubydoc API documentation. So I'm left trawling through the gem's code to figure out what to do.

Spend hours writing a custom before_send without any documentation on how to actually do that, and no defaults or examples to start from, as a non security expert who's likely to make mistakes.

The improvement I'll do here is adding rubydoc for the top-level APIs and the interfaces of essential classes like Sentry::Event. I think that'll help you write the filtering logic.

  • Remove headers except for those in an allowlist (is there a safe default list provided anywhere in the SDK or Rails itself?)
  • Remove params in a blocklist (again is there one built in?), and perhaps remove any that match patterns like email addresses just in case.

There's a rack_env_whitelist config for that, which's not on our official document yet. I'll update the document for it as well.

  • Does the local variables feature need scrubbing separately? How?

Given it's still a new feature and only a few languages have it, we don't have an official approach on scrubbing locals atm. So if there's any concern, I'd recommend turning it off for now. But we'll be happy to discuss an approach (e.g. a before_locals_capture callback) for it. Feel free to open another issue for the discussion.

  • Do breadcrumbs need scrubbing separately? I'm not clear on what they might even contain or how to test that.

It's impossible for us to know either as the breadcrumb loggers only listen to a certain protocol and record whatever is passed. For example, active_support_logger captures breadcrumbs from user application's ActiveSupport instrumentation, which any library/appliction code can use to pass data.

But there's a before_breadcrumb callback that you can use to update/filter breadcrumbs.

Whereas if you maintained a small number of sensible presets then Sentry and the community could keep them up to date and every customer would benefit.

Data scrubbing, unlike other SDK features, highly depends on individual application's needs. What we can do is to provide high-level interfaces like before_send and before_breadcumb with documentations (which I'll start improving from today). Anything further than that is likely to risk breaking some other clients' app and/or introduce performance overhead (to check various edge cases), which both happened with our old SDK's data scrubbing processors.

I'm sorry for the confusion and time spent on debugging because of the lack of documentation. And thank you for spending time to provide so much feedback. I hope we can eventually come up solutions to:

st0012 commented 2 years ago

@sfcgeorge here's a start for our document improvements: https://github.com/getsentry/sentry-ruby/pull/1635