Closed st0012 closed 3 years ago
@sfcgeorge if you google "pii query string", you can find several cases that pii is leaked from the query string (example). so I think it's reasonable to consider it as pii sensitive. and because the SDK doesn't know if our users would put pii in their query string (they probably can't even be 100% sure either), we an only assume all query strings considering pii and not sending them.
@rhcarvalho do you think it's reasonable to replace the send_default_pii
option with more fine-grained options?
I'll update the doc to mention it though, thanks for catching it.
On first look, I'd avoid the fine-grained options for now if there's no strong justification for them -- configuration comes at a cost.
For a custom data-scrubbing strategy we already offer Before Send or Event Processors (a generic way to update events before they go out of the SDK) in the client side, and controls on the server side (or on a local Relay proxy).
All details in https://docs.sentry.io/platforms/ruby/data-management/sensitive-data/
Perhaps that's new to the folks in this discussion? Is that helpful information?
I agree that PII can be leaked in this way and completely agree that there need to be paranoid defaults.
But I don't think it should be the only option. This is the Ruby/Rails SDK and Rails devs have a good understanding of REST, CRUD and HTTP verbs and most wouldn't put anything sensitive or non-stateless in GETs. Our company code reviews every PR, we use automated security linters, have independent security audits, and a lawyer written privacy policy.
Reset tokens are a valid exception, but they're short lived and expire once used so not a huge threat. Nonetheless since sentry-ruby already has various integrations and defaults it could strip out Devise and Clearance stuff automatically to be safe. I realise it's like whack-a-mole and you can't cover every library, but as it would be an opt-in choice so I think that's ok.
The name send_default_pii
is quite unclear to me too. What does default include, or not include? What does PII stand for? Does personally identifiable information mean just personal stuff like phone numbers that would violate GDPR but wouldn't be an issue in all countries, or for B2B companies, or companies with a comprehensive privacy policy? Or does PII mean security information like passwords, one time tokens, credit cards? Really it's both but the name "personal information" really understates it and doesn't convey the gravity of what would be leaked if you disabled it (or rather turn it on).
The bigger issue I have with this setting is you effectively only offer 2 modes:
false
(default): send nothing and Sentry is practically useless; a glorified stacktrace.true
: leak all possible sensitive data to Sentry.Sentry really is close to useless without parameters. How am I supposed to debug anything without IDs? I've got an undefined method on nil error so I'm left guessing whether the database record really is missing, or is our code just broken? I can't look up the record because I don't have the ID. And I can't even replicate because, well, I've got no IDs.
I have seen the scrubbing and filtering docs. The first example in both is filtering out a ZeroDivisionError which is something nobody would ever ask how to do. The params |event, hint|
aren't explained so I'll have to go digging with puts
and binding.pry
to figure out what they are. The other scrubbing examples all say "The platform or SDK you've selected either does not support this functionality, or the code sample is missing from our documentation" and they're about adding extra information not scrubbing it anyway. The filtering docs also don't have any scrubbing examples. So no actual docs on scrubbing despite the page title. Ok fine I'll look in sentry-ruby/examples
... nope no examples at all using config.before_send
. But surely the sentry-rails/example
has it? No! In fact it actually sets send_default_pii
to true! Is that what you're recommending? Crazy. The migration guide just says "for scrubbing sensitive data, please use Sentry's Advanced Data Scrubbing feature" which is server side scrubbing but we really don't want to be sending the data in the first place (that guide also needs updating to mention GET params now being included under send_default_pii
). The gem doesn't even have rubydoc API documentation. So I'm left trawling through the gem's code to figure out what to do.
Unless I'm missing something (I hope I am) the options you're giving are:
send_default_pii
which is what your example code does and I bet what most of your users end up doing out of confusion and desperation.before_send
without any documentation on how to actually do that, and no defaults or examples to start from, as a non security expert who's likely to make mistakes. Short term please at least provide a code example of how to write a sensibly strict before_send
that scrubs most things. I can find some guidance in the code itself but it's far too brute force. I assume it'll need to:
And then there's the risk that if the SDK's API changes, or the way Rails handles things changes, the custom before_send
may silently stop scrubbing sensitive information and we'd never know about it. So I guess we have to write a comprehensive test suite too. Whereas if you maintained a small number of sensible presets then Sentry and the community could keep them up to date and every customer would benefit.
We use Sentry as a hosted solution because historically we've been blown away by your top notch UX, design, and customer support. I hope you can find a better long term solution that gets Sentry back to that standard.
Hey @sfcgeorge, we really appreciate your feedback on the topic. Here's the short version of my response:
event
and hint
in before_send
. Sentry::Event
.And here's a more detailed response:
Reset tokens are a valid exception, but they're short lived and expire once used so not a huge threat. Nonetheless since sentry-ruby already has various integrations and defaults it could strip out Devise and Clearance stuff automatically to be safe. I realise it's like whack-a-mole and you can't cover every library, but as it would be an opt-in choice so I think that's ok.
Scrubbing data for libraries like Devise in the SDK isn't a maintainable way to deal with these issues. But if someone is willing to make gems like sentry-devise
, we'll be happy to help :-)
There's a community-built data scrubbing extenion: https://github.com/mrexox/sentry-sanitizer that may be an example for this.
The name
send_default_pii
is quite unclear to me too. What does default include, or not include? What does PII stand for?
I don't get the point. What we should call it otherwise? Of course it'll have a generic name with more information in the document. (But thanks again for catching the missing info).
The first example in both is filtering out a ZeroDivisionError which is something nobody would ever ask how to do.
You can substitude it with any exception class you want to filter out. It's never to be a thing that you can paste into your code and just work. How are we suppose to know about the use case you have?
The params
|event, hint|
aren't explained so I'll have to go digging withputs
andbinding.pry
to figure out what they are.
I'll add more information about the event and hint parameters.
that guide also needs updating to mention GET params now being included under
send_default_pii
I'll update it, thanks.
The gem doesn't even have rubydoc API documentation. So I'm left trawling through the gem's code to figure out what to do.
Spend hours writing a custom
before_send
without any documentation on how to actually do that, and no defaults or examples to start from, as a non security expert who's likely to make mistakes.
The improvement I'll do here is adding rubydoc for the top-level APIs and the interfaces of essential classes like Sentry::Event
. I think that'll help you write the filtering logic.
- Remove headers except for those in an allowlist (is there a safe default list provided anywhere in the SDK or Rails itself?)
- Remove params in a blocklist (again is there one built in?), and perhaps remove any that match patterns like email addresses just in case.
There's a rack_env_whitelist
config for that, which's not on our official document yet. I'll update the document for it as well.
- Does the local variables feature need scrubbing separately? How?
Given it's still a new feature and only a few languages have it, we don't have an official approach on scrubbing locals atm. So if there's any concern, I'd recommend turning it off for now. But we'll be happy to discuss an approach (e.g. a before_locals_capture
callback) for it. Feel free to open another issue for the discussion.
- Do breadcrumbs need scrubbing separately? I'm not clear on what they might even contain or how to test that.
It's impossible for us to know either as the breadcrumb loggers only listen to a certain protocol and record whatever is passed. For example, active_support_logger
captures breadcrumbs from user application's ActiveSupport instrumentation, which any library/appliction code can use to pass data.
But there's a before_breadcrumb
callback that you can use to update/filter breadcrumbs.
Whereas if you maintained a small number of sensible presets then Sentry and the community could keep them up to date and every customer would benefit.
Data scrubbing, unlike other SDK features, highly depends on individual application's needs. What we can do is to provide high-level interfaces like before_send
and before_breadcumb
with documentations (which I'll start improving from today). Anything further than that is likely to risk breaking some other clients' app and/or introduce performance overhead (to check various edge cases), which both happened with our old SDK's data scrubbing processors.
I'm sorry for the confusion and time spent on debugging because of the lack of documentation. And thank you for spending time to provide so much feedback. I hope we can eventually come up solutions to:
@sfcgeorge here's a start for our document improvements: https://github.com/getsentry/sentry-ruby/pull/1635
Moved from raven to sentry-ruby and spent ages wondering why query params aren't being sent any more. I did wonder about send_default_pii but the docs just say:
No mention of query. I'm used to POST body being considered PII as the docs say but I'm not used to GET params being considered PII so could you please make that clear in the docs.
I'm also not sure how to turn it back on now without turning everything on. I don't want cookies or POST bodies because those are very sensitive, but I do want GET params. The
send_default_pii
is a bit brute force, it would be more helpful if there were separate configs likesend_default_pii_query
.