Closed rviscomi closed 4 years ago
Here's a look at the top 10 HTTP status codes for desktop/mobile tests' initial request:
SELECT
client,
req AS total,
status.value AS status,
status.count,
status.count / req AS pct
FROM (
SELECT
_TABLE_SUFFIX AS client,
APPROX_TOP_COUNT(status, 10) AS status,
COUNT(0) AS req
FROM
`httparchive.summary_requests.2020_02_01_*`
WHERE
firstReq
GROUP BY
client),
UNNEST(status) AS status
ORDER BY
pct DESC
client | total | status | count | pct |
---|---|---|---|---|
desktop | 3815604 | 200 OK | 3373097 | 88.40% |
mobile | 5091698 | 200 OK | 4293166 | 84.32% |
mobile | 5091698 | 302 Found | 490910 | 9.64% |
desktop | 3815604 | 302 Found | 218252 | 5.72% |
mobile | 5091698 | 301 Moved Permanently | 284433 | 5.59% |
desktop | 3815604 | 301 Moved Permanently | 206511 | 5.41% |
mobile | 5091698 | 307 Temporary Redirect | 10744 | 0.21% |
desktop | 3815604 | 307 Temporary Redirect | 7421 | 0.19% |
desktop | 3815604 | 204 No Content | 4983 | 0.13% |
desktop | 3815604 | 303 See Other | 4910 | 0.13% |
mobile | 5091698 | 303 See Other | 6027 | 0.12% |
mobile | 5091698 | 204 No Content | 5844 | 0.11% |
desktop | 3815604 | 308 Permanent Redirect | 215 | 0.01% |
mobile | 5091698 | 308 Permanent Redirect | 237 | 0.00% |
mobile | 5091698 | 0 | 236 | 0.00% |
desktop | 3815604 | 0 | 155 | 0.00% |
desktop | 3815604 | 206 Partial Content | 36 | 0.00% |
mobile | 5091698 | 203 Non-Authoritative Information | 46 | 0.00% |
mobile | 5091698 | 206 Partial Content | 42 | 0.00% |
desktop | 3815604 | 203 Non-Authoritative Information | 9 | 0.00% |
I've manually added the name of the status code for clarification.
A few high level observations:
Here's an analysis of the Location
header in relation to the URL of the first request. It counts the number of pages that redirect to a Location
on the same domain.
SELECT
_TABLE_SUFFIX AS client,
status,
STARTS_WITH(resp_location, '/') OR NET.REG_DOMAIN(resp_location) = NET.REG_DOMAIN(url) AS same_domain_redirect,
COUNT(0) AS count
FROM
`httparchive.summary_requests.2020_02_01_*`
WHERE
firstReq AND
status IN (301, 302, 307, 308)
GROUP BY
client,
status,
same_domain_redirect
HAVING
same_domain_redirect IS NOT NULL
ORDER BY
count DESC
client | status | same_domain_redirect | count | total | pct |
---|---|---|---|---|---|
mobile | 302 | TRUE | 453,342 | 490,910 | 92.35% |
mobile | 301 | TRUE | 248,106 | 284,433 | 87.23% |
desktop | 302 | TRUE | 191,159 | 218,252 | 87.59% |
desktop | 301 | TRUE | 181,387 | 206,511 | 87.83% |
mobile | 301 | FALSE | 31,973 | 284,433 | 11.24% |
desktop | 301 | FALSE | 22,045 | 206,511 | 10.67% |
mobile | 302 | FALSE | 15,985 | 490,910 | 3.26% |
desktop | 302 | FALSE | 13,092 | 218,252 | 6.00% |
mobile | 307 | TRUE | 10,214 | 10,744 | 95.07% |
desktop | 307 | TRUE | 6,987 | 7,421 | 94.15% |
mobile | 307 | FALSE | 507 | 10,744 | 4.72% |
desktop | 307 | FALSE | 418 | 7,421 | 5.63% |
mobile | 308 | TRUE | 213 | 237 | 89.87% |
desktop | 308 | TRUE | 200 | 215 | 93.02% |
mobile | 308 | FALSE | 22 | 237 | 9.28% |
desktop | 308 | FALSE | 13 | 215 | 6.05% |
I've manually copied the "total" column from the previous results and calculated the "pct". So 92% of mobile 302 responses redirect to the same domain.
Of the 284,433 mobile 301 (permanent) redirects, 87% point to the same domain. Desktop is similar.
In total, 90% of all first request redirects point to the same domain. There may be a redirect chain which ends up on another domain, which isn't accounted here. 35K desktop pages and 50K mobile pages redirect to a different domain.
Finally, here's a look at the domains that get redirected to:
SELECT
client,
status,
redirect_domain,
COUNT(0) AS count
FROM (
SELECT
_TABLE_SUFFIX AS client,
status,
NET.REG_DOMAIN(resp_location) AS redirect_domain,
STARTS_WITH(resp_location, '/') OR NET.REG_DOMAIN(resp_location) = NET.REG_DOMAIN(url) AS same_domain_redirect
FROM
`httparchive.summary_requests.2020_02_01_*`
WHERE
firstReq AND
status IN (301, 302, 307, 308))
WHERE
NOT same_domain_redirect
GROUP BY
client,
status,
redirect_domain
ORDER BY
count DESC
LIMIT
50
client | status | redirect_domain | count |
---|---|---|---|
desktop | 302 | medium.com | 1,221 |
mobile | 302 | medium.com | 1,163 |
mobile | 302 | indapass.hu | 764 |
mobile | 301 | jimdofree.com | 755 |
mobile | 301 | listcrawler.eu | 688 |
mobile | 301 | linkfire.com | 613 |
desktop | 302 | indapass.hu | 591 |
desktop | 301 | linkfire.com | 520 |
mobile | 302 | google.com | 467 |
desktop | 302 | google.com | 386 |
mobile | 302 | elsevierhealth.com | 302 |
desktop | 302 | elsevierhealth.com | 301 |
mobile | 302 | clickfunnels.com | 300 |
desktop | 301 | jimdofree.com | 277 |
desktop | 302 | clickfunnels.com | 268 |
desktop | 302 | stremanp.com | 209 |
mobile | 302 | stremanp.com | 208 |
mobile | 302 | roberat.com | 157 |
mobile | 301 | google.com | 149 |
mobile | 307 | vchecks.me | 142 |
desktop | 302 | gitbook.com | 131 |
mobile | 301 | tripadvisor.com | 100 |
desktop | 302 | blogger.com | 99 |
mobile | 301 | pornvida.com | 95 |
mobile | 302 | w88in.com | 93 |
mobile | 302 | note.com | 90 |
desktop | 302 | note.com | 88 |
desktop | 302 | onelogin.com | 87 |
desktop | 301 | qodeinteractive.com | 82 |
desktop | 301 | google.com | 81 |
desktop | 301 | listcrawler.eu | 74 |
mobile | 302 | engagingnetworks.net | 73 |
mobile | 302 | timeweb.ru | 71 |
desktop | 302 | engagingnetworks.net | 70 |
desktop | 301 | unblockit.red | 69 |
desktop | 302 | imodules.com | 61 |
desktop | 301 | tripadvisor.com | 60 |
mobile | 302 | vueher.com | 59 |
mobile | 302 | booking.com | 58 |
mobile | 302 | imodules.com | 58 |
desktop | 302 | st-hatena.com | 58 |
mobile | 302 | gitbook.com | 55 |
mobile | 301 | unblockit.red | 55 |
desktop | 302 | vueher.com | 53 |
mobile | 302 | facebook.com | 50 |
mobile | 301 | bongacams5.com | 48 |
desktop | 307 | vchecks.me | 47 |
desktop | 302 | booking.com | 47 |
mobile | 301 | surveygizmo.com | 47 |
desktop | 301 | surveygizmo.com | 46 |
Medium gets over 1K 302 (temporary) redirects. The top 301 (permanent) redirect locations are:
There are also 100 301 redirects to tripadvisor.com. I looked into these and the firstReq
seems to be misattributed to the wrong request (not the first one). The others seem ok.
So to sum up, about 15% of tests' initial request get a status other than 200 OK. 301 Moved Permanently accounts for about 5% of initial requests. Only about 10% of that 5% redirect to another domain. The domains that do get redirected to most often seem to be aggregators that host lots of content.
Context: 1 and 2
Our URLs come from the CrUX dataset based on real Chrome usage. It's possible that when we test these URLs, we're getting redirected because of a lack of authentication, geo-blocking, or other discrepancies from real user expectations. It's also possible that these origins are always supposed to redirect and maybe CrUX is mistakenly assigning the UX data from the canonical origin to the one that redirects.
Analyze the dataset for examples of base page URLs that redirect. It'd be good to understand how many initial URLs in our corpus are redirecting, if there are any patterns (eg lack of authentication), and what can be done to deduplicate results.