buettner / private-prefetch-proxy

Proposal to use a CONNECT proxy to obfuscate the user IP address for privacy-enhanced prefetching.
32 stars 6 forks source link

Consider changing behaviour of requests to /.well-known/traffic-advice to stop 404 log entries #17

Open adamseabrook opened 3 years ago

adamseabrook commented 3 years ago

Semi-related to https://github.com/buettner/private-prefetch-proxy/issues/16

We monitor specifically for requests from Google IP ranges (specifically 66.*) that get anything other than a 200 response. This helps us identify issues where a page in our CMS may have got unpublished or someone changed a slug without adding a 301 and the GoogleBot is now running into 404s or other issues.

In the past 7 days we got 3,290 404s for https://www.betterteam.com/.well-known/traffic-advice file which triggered a number of alerts. I know we can ignore this directory entirely but it would be ideal if another method could be found which does not cause 404s.

404s also bypass our cache which then hits our origin server with extra requests.

We added the missing file to https://www.betterteam.com/.well-known/traffic-advice so it now gets a 200.

buettner commented 3 years ago

Sorry this is causing you trouble.

FWIW, the volume of requests should go down soon as we're implementing another caching layer.

Two questions:

  1. Would it be easier for you to add a field to your DNS entry instead of using traffic-advice to control prefetching? We considered this, but based on other feedback that modifying DNS records is often hard for developers but adding a file is easy, we settled on the traffic-advice approach.
  2. Were there challenges to constructing and adding the traffic-advice file? https://github.com/buettner/private-prefetch-proxy/issues/16 suggested there can be challenges, and we'd like to know if that is the common case.
adamseabrook commented 3 years ago

I think either a DNS entry or adding something to the <head> section or server headers would make the most sense. Headers or head section will also mean all the SEO plugins like Yoast can have an option added to turn this off or on like they do with other robots related things: https://wordpress.org/support/plugin/wordpress-seo/ (Wordpress) https://plugins.craftcms.com/sprout-seo (Craft) https://github.com/nystudio107/craft-seomatic (Craft)

Adding a file with a custom mime type I think will be beyond your average user (I am not sure they will even care about this though). The users that do care about this are probably not going to be too happy about any process that requires actual development as one of the comments mentioned in #16 DNS entry would be the easiest but wont give you page level control.

For us it was easy as we just added it as an advanced response in Fastly (see below). I took a quick look in Cloudflare and could not see any way to create a response there with a custom mime type.

image

buettner commented 3 years ago

The limitation with <head> is that the proxy can't see the page content, only the browser can. The proxy can fetch traffic-advice and cache it to stop prefetch traffic from reaching the site.

I'm happy to see that Fastly supports this well at least.

The tension here is that we should follow best practices. The /.well-known URL RFC says that a "good practice" is "Using an application-specific media type in the Content-Type header field, and requiring clients to fail if it is not used", and the W3C Web Platform Design Principles states, "Always define a corresponding MIME type and extend existing APIs to support this type for any new data format."

robrwo commented 2 years ago

Adding a DNS entry is not available to most developers. It also requires an additional skillset that not every developer has.

As for the application-specific media type, I would point out that there are several well-known URLs with the .json extension, and others with the .txt extension.

Pino4 commented 2 years ago

At 1 of the largest hosters in the world (SiteGround) it is not possible on all hosting plans to set the MIME type in the .well-known folder. Siteground's response: _"the .well-known folder has a separate configuration in nginx so you would not be able to change the MIME type of any files within. _Our system has a unified setup on all servers and we cannot exclude the .well-known folder from Nginx or add any custom rules for it. This folder is used for various internal checks and SSL verification files.

I am afraid that this Private Prefetch Proxy option is not compatible with our servers at the moment."

jeremyroman commented 2 years ago

Thanks for sharing that.

As noted, the .well-known directory is sometimes specially managed because it's used for other potentially sensitive origin-wide features, like TLS certificate issuance (ACME HTTP-01), with similar needs to be assured of being presented by the site owner.

Your hosting provider could support this in the future by either serving the traffic advice themselves (and giving customers some UI affordance for controlling it, possibly even dynamically) or by rewriting it internally (not by serving a redirect) to some URL that customers do have configuration control over, such as:

location = /.well-known/traffic-advice {
  rewrite ^/\.well-known/traffic-advice$ /.some-other-path/traffic-advice;
}

Of course, neither of these helps you immediately and it's useful to know that this is a barrier to some.