Closed buettner closed 5 months ago
Network administrators can control how the feature works on their network, as described here.
What is the purpose-specific domain?
Also will the proxies be configured as domains or IP addresses? Local control may be preferable for users via domain blocking.
Sorry about that. I added the domain to the main post (dns-tunnel-check.googlezip.net).
The proxy is configured as a domain -- tunnel.googlezip.net.
We will soon start an opt-in Early Access Program with the goal of helping publishers evaluate the technology and provide feedback to inform our plans.
To help interested publishers assess this feature and provide feedback, we’ll soon begin an opt-in Early Access Program (EAP) for Chrome’s Private Prefetch Proxy on Android. During the EAP, this feature will be trialed only on Google Search to prefetch links to websites participating in the EAP.
Users can always opt out of Google Search result prefetching via Chrome’s Preload setting. The feature will also be disabled for pages loaded in Incognito mode. From our most recent experiment, we found that the byte overhead from unused prefetches was far less than 1% of overall user traffic.
Interested publishers will need to indicate their desire to participate in the EAP by creating a traffic advice file that includes a dedicated EAP field (google_prefetch_proxy_eap
).
Example:
[
{
"user_agent": "prefetch-proxy",
"google_prefetch_proxy_eap": { "fraction": 1.0 }
}
]
fraction
should be a value between 0.0
and 1.0
(i.e. 0% to 100%). This field controls the fraction of requested prefetches that the Private Prefetch Proxy will send to the destination site (the remainder will be dropped by the proxy). EAP participants may want to start with a smaller fraction (e.g. 0.1
), to monitor their key metrics, and gradually ramp up to 1.0
.
We recommend that interested publishers join this announcement mailing list to receive key updates about the EAP (e.g. starting, gradual rollout, observations on our end). For support inquiries, we’ve created this support mailing list.
From our most recent experiment, we observed that the vast majority of websites saw less than a 2% increase in main HTML fetches, and a 20+% faster LCP when a prefetched resource was used. (Note that this feature is Android only, so the overall increase in traffic is much lower.)
For the EAP, we will only have egress IPs in a few countries, and users are mapped to the IP range that is closest to their ingress IP.
We recommend one of the following approaches:
Purpose: prefetch
header, and doing a reverse DNS lookup on the IP addresses. The proxy IP addresses will resolve to XYZ.fetch.tunnel.googlezip.net
where XYZ depends on the specific IP address. Publishers can reject the prefetch request if it is trying to access content subject to geo-blocking. Network administrators can always control how the feature works on their network, as described here.
This is great news. I've started looking into how we can enroll in this early access program for our web frontend. However, before I start, few questions below:
During the EAP, this feature will be trialed only on Google Search to prefetch links to websites participating in the EAP.
Would the proxy share user's cookies with our web frontend? If not, then we run into the risk of showing incorrect content to the user and annoying them. How do we avoid that on our web frontend?
From our most recent experiment, we observed that the vast majority of websites saw less than a 2% increase in main HTML fetches, and a 20+% faster LCP when a prefetched resource was used.
Would it be possible for you to provide data on how much was the average performance improvement as measured using CWV metrics? My understanding is that the 2% number corresponds to all HTML fetches to a specific site. However, the 20% number refers to only the successful prefetches? The two numbers are evaluated over very different datasets which makes it harder to make any tradeoff decisions. We're worried about the prefetch costs, so any data related to average CWV improvement would help us make a better case to the top brass.
PS: I'm using a personal github account because I do not yet want my comments to be associated with my employer.
Happy to hear you're interested!
Would the proxy share user's cookies with our web frontend? If not, then we run into the risk of showing incorrect content to the user and annoying them. How do we avoid that on our web frontend?
User cookies will never be sent on prefetch requests for privacy reasons. For correctness reasons, Chrome can't naively use a prefetched resource if it should have had a cookie on the request (as you mentioned). This means that prefetching is only effective when the user does not have a cookie for the origin, which is common for cross-origin navigations. Moreover, a navigation without a cookie is often the user's first visit to the site, which tends to be slower than average as the user has no cached resources. I.e., speeding up first-visits is often more important than speeding up subsequent visits.
However, we do have a proposal that allows sites to tell Chrome that the HTML is not dynamically generated based on the cookie and is safe to use even if prefetched without a cookie. Once the user navigates to the page, cookies will be sent on subsequent requests.
If this proposal might work for you, we'd be very interested in hearing your feedback!
Would it be possible for you to provide data on how much was the average performance improvement as measured using CWV metrics?
On average, LCP improved by ~3%. Though we hope to improve this with better triggering.
On average, LCP improved by ~3%. Though we hope to improve this with better triggering.
From our most recent experiment, we observed that the vast majority of websites saw less than a 2% increase in main HTML fetches, and a 20+% faster LCP when a prefetched resource was used.
Thanks for the quick reply and thanks for explaining. It's important for us to get the details right so we can make the right tradeoffs among the engineering, traffic costs and CWV gains.
If the increase in main HTML fetches is 2%, then even with the assumption of 100% precision, the prefetch should speed up at most 2% of the webpages. Even if we optimistically assume speedup of 100% (instead of the actual 20% speedup) for those 2% page loads, that translates to an average of 2% LCP improvement. In practice with lower precision and 20% speedup (instead of 100%), the LCP improvement should be much lower. What am I missing?
Chrome can't naively use a prefetched resource if it should have had a cookie on the request (as you mentioned). This means that prefetching is only effective when the user does not have a cookie for the origin, which is common for cross-origin navigations.
we do have a proposal that allows sites to tell Chrome that the HTML is not dynamically generated
Does this mean that Chrome will be prefetching in many cases but not actually using up the prefetched resource? Does that further lower down the precision to be useful only once per Chrome install unless we rewrite our frontend?
What am I missing?
Sorry, that was my fault. I still didn't give you numbers across the same populations.
3% is improvement on LCP for all navigations coming from Google Search.
The challenge is that aggregate performance impact will vary dramatically across sites, depending on how much of their traffic comes from Search. Some sites with rich content primarily have same-origin navigations, whereas others are primarily landing pages that get much of their traffic from Search. Also, some sites value the navigations from Search (users discovering their site) more highly than subsequent user navigations.
The question we primarily wanted to answer was how much additional traffic will this impose on users, ISPs, and origins. The answer is that it's not much in aggregate, and it's very rare for any site to see a large increase in requests.
This is one of the purposes of the EAP -- it gives sites a way to slowly ramp up prefetch traffic while evaluating their own overhead/performance metrics.
Does this mean that Chrome will be prefetching in many cases but not actually using up the prefetched resource?
Yes. Though we have plans to reduce this additional overhead. If you want the performance improvement for users who have visited your site before (assuming you set a cookie when they do), that will require changes on your frontend. Potentially, this could be as simple as adding the 'Supports-Loading-Mode: uncredentialed-prefetch' header (note: this is not yet implemented, but we can prioritize it if there is broad interest). But it depends on your site and how you use cookies.
Thanks @buettner. Looking at our server logs, we get ~8% of Chrome traffic from Google, and rest from users clicking on links etc. I wrote a quick simulation to measure the CWV impact if we speed up 8% of page loads by 3%. My simulation shows about ~0.2% reduction in LCP. Does that sound reasonable to you?
Do you have improvement numbers for other CWV metrics? e.g., First Input Delay or CLS?
That sounds reasonable. Though, your results may vary, as the impact is not consistent across sites.
We did not see significant changes in CLS or FID.
I'm happy to hear you're interested in the EAP! We'll keep you posted on timing via the mailing list.
I'm giving feedback because I had a problem when I delivered the traffice advice file and opt-in to EAP to do a prefetch from Google search.
Some of the pages are geo-restricted by IP, so the pages will return status code 403 if Purpose: prefetch is in the request header, as documented below.
Publishers should look for the Purpose: prefetch request header and respond with an HTTP 403 (Forbidden) (see Geolocation for an example use case). https://github.com/buettner/private-prefetch-proxy#publisher-opt-out
If Chrome's prefetch setting is enabled, when you type a URL directly into the address bar, not just from Google search, it will be prefetched and the Purpose: prefetch will be included in the request header. https://support.google.com/chrome/answer/1385029?hl=en&co=GENIE.Platform%3DAndroid&oco=2
We have confirmed that this causes a 403 page to be displayed when the URL is entered into the address bar. I guess I need to do a DNS lookup and look at the address as described in this issue.
Filed https://bugs.chromium.org/p/chromium/issues/detail?id=1284708 to investigate this on the Chromium side.
Thanks for the feedback!
It seems like showing the error page is a bug, and we're following up on the bug jeremyroman filed.
Before launch, country-level IP geolocation will work as expected. But during the EAP, and in the future if you need finer granularity, the DNS lookup of the address is currently the only way to determine if the prefetch is from the proxy. However, we are hoping to make this detection easier in the future. We will update here when we have more details.
The Early Access Program (EAP) is now live. If you are a publisher and have opted-in to the program by adding a traffic-advice file (described here), you should start seeing traffic from Chrome versions M97 and higher.
If you haven’t opted-in yet but are interested in the feature, please consider joining the program (and join the update mailing list)!
If you have any questions, don't hesitate to reach out on the support mailing list.
We look forward to your feedback!
Note that missing from the description above is that the .well-known/traffic-advice
response must have an application/trafficadvice+json
MIME type (set via the Content-Type
header) as mentioned of the traffic-advice proposal.
For convenience, https://traffic-advice-checkup.netlify.app/ can be used as a quick diagnostic to catch some simple errors and summarize the expected behavior of the private prefetch proxy. (It's a separate app, but written with reference to the actual prefetch proxy source.)
Over the coming weeks, we will begin rolling out the “private prefetch proxy” feature for Chrome 103 on Android. This feature results in a median 30% LCP improvement when a prefetch is used. While the majority of sites will see less than a 2% increase in HTML fetches (and a much lower increase in overall bytes, as images and other resources are not prefetched), if you wish to limit the amount of prefetch traffic sent to your site, you may need to update your traffic-advice file. In particular, if you’ve specified the "google_prefetch_proxy_eap" parameter it will need to be replaced with the “fraction” parameter.
E.g.,
[
{
"user_agent": "prefetch-proxy",
"fraction": 0.5
}
]
Very nice tech, my question is - as I updated my traffic advice I wanted to check server logs if google search was requesting the traffic advice file in the past, but it didnt, why is that? Site traffic is PL/EU.
The traffic advice file is fetched only when Chrome attempts to prefetch a page from the site.
If you see requests to your site via the proxy, then you should also see a fetch for the traffic advice file.
How do you stop fetch.tunnel.googlezip.net? I set fraction to 0 in .well-known/traffic-advice, but it still continues to access my site. I ended up returning errors 401 on all accesses from IPs listed here : https://www.gstatic.com/chrome/prefetchproxy/prefetch_proxy_geofeed It has been two months and it continues to access my site consuming bandwidth and filling up my error logs. For example, yesterday I had 108,492 normal accesses and 247,885 accesses blocked from those IPs.
Sorry you're having this problem.
Looking at your traffic advice file, it looks like you're still using the Early Access Program token.
Can you update the traffic advice file to align with the one in this comment?
Also, you can check the expected behavior and make sure there are no errors in the config using this test app.
Thanks, Buettner. I changed it to the new format. I also ran the test app. Before it told me that the EAP (current) was 0%, but FUTURE was 100%. Now FUTURE is also 0%, so hopefully this will fix the problem.
I allowed private proxy fetch about week ago, however I see only few request from private proxy fetch chrome a day, although my site has quite huge raffic and 90% originates from chrome. I have quite many top1 rankings in google so I thought proxy fetch would be used more often, site traffic is from Poland,
Thanks, Buettner. I changed it to the new format. I also ran the test app. Before it told me that the EAP (current) was 0%, but FUTURE was 100%. Now FUTURE is also 0%, so hopefully this will fix the problem.
I checked this morning, and there were still lots of accesses yesterday. What IP does it use to read the traffic advice? If it is one of those that I am blocking, then it won't be able to read it.
Ah, yes. The traffic advice fetches come from the same set of IPs.
Oh, no. So is there any way I can stop it without incurring thousands prefetches a day? Why did it start prefetching my site? Shouldn't the default be no prefetching, unless I specify that I want to in the traffic advice?
So I am returning errors 403 (not 401, as I previously stated), to accesses from those IPs. Shouldn't the prefetch robot stop trying to prefetch?
So I am returning errors 403 (not 401, as I previously stated), to accesses from those IPs. Shouldn't the prefetch robot stop trying to prefetch?
So, are there any plans to fix this?
Sorry for the delay. I was on vacation and Monday was a holiday.
It looks like our service was getting a DNS error when fetching your traffic advice. Not sure why that would be the case, but it seems likely it's related to the filtering you set up for specific IP blocks.
In any case, I disabled the feature for your domain.
In any case, I disabled the feature for your domain.
Thank you very much, Buettner! I saw a big reduction yesterday on the number of accesses from those IPs.
I need this ips to minimize hits too, I have tried to put the traffic advice file but the validation says me that need content type MIME application/traffic-advice+json (or something similar). How can I upload it correctly with this format? Too much prefetch hits from this ips... a few its ok but sometimes (depending the hour of the day) I have more than 15 per minute.... Thats crazy.
EDIT: I get it, let see if the amount of visits goes down. I have just putted fraction 0,1
@buettner in your comment from Feb 18, 2022 you mentioned the update mailing list. When following that URL I get a "content not available" error page on Google Groups. Is this still a thing to follow, or has it been phased out?
It has been phased out as the feature has graduated from the Early Access Program.
This prefetching algo use only for html to parse and show for client, or work also like prefetcher for website's static CDN, extermal scripts, fonts, etc?
Correct, only the mainframe HTML is prefetched.
Deploying Chrome’s Private Prefetch Proxy
We’re beginning to experiment with a Private Prefetch Proxy for Chrome on Android. More information can be found here.
Initially, the proxy will only be available for prefetches initiated by Google Search using Speculation Rules. The reason for this initial restricted scope is that the proxy is run by Google, and to remain compatible with the user’s privacy expectations when visiting a website, the proxy can only receive information about URLs on Google properties (which Google inherently knows about). As a reminder, no user identifier is sent on requests to the proxy, and any information learned by the proxy is used solely to facilitate anonymous prefetching and is not linked to other information from your Google account.
User Opt-out
Users can opt out of Google Search result prefetching via Chrome’s Preload setting. The feature will also be disabled for pages loaded in Incognito mode.
Publisher Opt-out
Some publishers may not want their links prefetched. We give them two ways to opt-out:
Network Administrators
Network administrators can control how the feature works on their network, as described here. The purpose-specific domain name used to trigger navigation-time DNS resolution is 'dns-tunnel-check.googlezip.net'.
Expanding Beyond Google Search
We think that the opportunity to speed up cross origin navigations would be appealing to many websites and that the resulting low friction discovery experiences would benefit users and the web. Because of these beliefs, we aspire to make this feature available to all websites.
However, in this case, the proxy would learn the host names of links on non-Google websites, which requires user notice and control. We are considering adding a one-time user opt-in by which users can inform Chrome that they would like to prefetch from non-Google sites via the proxy.
Before we move forward with this proposal, we’d like to discuss the following aspects with the community: