alphagov / govuk_frontend_toolkit

❗️GOV.UK Frontend Toolkit is deprecated, and will only receive major bug fixes and security patches.
MIT License
403 stars 107 forks source link

Provide `PIISafe` wrapper object #448

Closed h-lame closed 6 years ago

h-lame commented 6 years ago

For: https://trello.com/c/vISRi8lY/58-investigate-preventing-pii-being-sent-to-ga

This wrapper object can be used to wrap arguments we want to send to analytics that we know are not PII because we generated them, but we also know may look like PII. Wrapping this kind of value in a PIISafe object tells the analytics PII stripping code that this value is already safe and instead of attempting to detect and remove PII from it, instead we should extract the raw value and pass that on.

We realised we needed this because when testing smart-answers on GOV.UK we noticed that the value of a custom dimension containing the content_ids of the taxons for the smart-answer had been stripped. The value of the custom dimension was eed5b92e-8279-4ca9-a141-5c35ed22fcf1, but it was transformed into eed5b92e-8279-4ca9-a141-5[postcode]22fcf1 because the substring c35ed in the final portion looks like a postcode, C3 5ED. To avoid this we'd like to be able to tell analytics that some values are definitely safe to send as they don't contain PII.

My first thought was to provide a list of tracker methods and values that should be ignored. Something like:

GOVUK.Analytics.safe_for_PII = [['setdimension', 1]]

This would tell GOVUK.Analytics that for any arguments that start with 'setdimension' and 1 we should assume the other values are safe for PII. However, it felt like the PII code would need to know too much about the values it is given in order to work on this. We'd also need to consider how to say that the first value argument for a function was safe, but we should still detect PII in any of the other arguments.

Our second thought, and the version in this commit, was to allow individual values in the arguments to be wrapped in a "safe" object that the analytics code knows to ignore. This is inspired by the rails html_safe functionality used in ERB views. Rails automatically pushes all output in an ERB template through an html escaping routine. If you know that the output is already safe, because it's a string of html you've generated in a helper for example, then you can flag it as html_safe to tell rails not to escape it.

h-lame commented 6 years ago

This will be followed up by a PR on static that uses the GOVUK.Analytics.PIISafe wrapper object on all custom dimensions that come from data on the page that we control.