freedomofpress / securedrop.org

Code for the SecureDrop project website
https://securedrop.org
GNU Affero General Public License v3.0
40 stars 9 forks source link

Check landing pages for assets on third party domain #506

Closed eloquence closed 5 years ago

eloquence commented 6 years ago

Check for scripts, styles, media, iframes, and other third party assets that should not be embedded in a landing page (i.e. are loaded from a different domain). Note that this initially only covers assets embedded in the HTML returned by the server, rather than ones dynamically loaded via JavaScript.

If such assets are found, it should trigger a severe warning (#496), but note above caveat regarding QA + outreach to news orgs.

We should only do this as part of the initial integration (#488) if we can decide on an implementation approach that's not overly burdensome and fragile.

harrislapiroff commented 5 years ago

We mentioned in meeting the strict approach of looking for loaded URLs and dinging the landing page for any cross domain request. It also occurrs to me that we could potentially lean on Privacy Badger's list if we wanted to limit our checks to known trackers.

harrislapiroff commented 5 years ago

... though looking through the cookie block list I'm not sure I actually am correctly understanding what that is a list of. There's a number of domains in there I wouldn't expect.

eloquence commented 5 years ago

As far as I can tell it doesn't really matter from our point of view whether the third party asset is intended for tracking or not. The problem is that it increases the vulnerability surface for third parties to learn about source behavior. True, analytics scripts are designed to be intrusive and collect the maximum of data, but the mere presence of logs on third party servers is problematic.

harrislapiroff commented 5 years ago

Makes sense to me

eloquence commented 5 years ago

OK, so the implementation options I see for this are:

My suggestion would be to attempt to add an experimental naive check for now and see how well it works for real-world use cases. The combination of the existing analytics checks + additional HTML parsing may get us sufficiently close -- even when resources are loaded via JS, it's likely often to be done via a <script> tag that we can detected in the page source. If this is too fragile, then I would recommend deferring this check until after the initial launch.

Thoughts?

eloquence commented 5 years ago

Just to capture the discussion from this morning:

eloquence commented 5 years ago

As discussed this morning, here are a few example landing pages with third party assets to test with: [examples redacted]

chigby commented 5 years ago

Example output for the first site [example redacted]

The information is structured like a plain text tree, with top-level pages/files and assets within those files. The "normal" lines are the URLs of a page, a script, or a CSS file and the indented lines with * bullets are assets contained in or requested by that asset.

See #557 for further explanation of what is checked for overall. [example redacted]