alphagov / govuk_frontend_toolkit

❗️GOV.UK Frontend Toolkit is deprecated, and will only receive major bug fixes and security patches.
MIT License
403 stars 107 forks source link

Strip the user's email address out of the location used by GA #466

Closed brucebolt closed 6 years ago

brucebolt commented 6 years ago

We have seen some paths in Google Analytics that include the user's email address, normally as part of the query string. These URLs always have zero pageviews, but a number of hits (GA measures hit as any activity, e.g. event, timing measurement, etc).

Although there is existing code in this repo to strip personal information (PII) from the path, there are some cases where GA generates a new request (e.g. page timings) that we have not covered. In this case, it reports the actual URL rather than our custom path. This aligns with the observation of these hits having no pageviews, since pageviews are already stripped of this information.

This PR addresses that problem by stripping email addresses out of all URLs and setting this as a global location for GA to use on all requests from that page.

https://trello.com/c/jKitCyZI/311-remove-emails-from-ga-events-on-email-alert-frontend-pages

NickColley commented 6 years ago

Since this keeps coming up is there an opportunity to avoid this in the server side applications?

For example if we know certain routes allow user input:

/regular/route/:user-input

output this route to the frontend as a sort of 'mask' which then can take the current URL and remove this section from the URL.

/regular/route/email@address.com -> /regular/route/[redacted]

This means we're not relying on matching arbitrary patterns, which as we've seen seems prone to letting things slip through..

bevanloon commented 6 years ago

@nickcolley - thanks that's a great suggestion. We've had a chat about it with some of the email-alert alumni and others on the platform-health team. We do want to do the work to obfuscate things server side, but we want to take the time to do that work correctly, both from a product point of view and from a user-journey point of view and with proper oversight about what is and isn't PII.

I wonder if we could push this through as a temporary band-aid since it does stop the immediate and time-sensitive issue we have.

NickColley commented 6 years ago

@bevanloon sure, I won't block this pull request just wanted to highlight that previous to this we've already had two 'temporary band-aids', so we should probably try a more robust approach in the future.