foxdavidj commented 4 years ago

Part IV Chapter 22: HTTP/2

Content team

Authors	Reviewers	Analysts	Draft	Queries	Results
@dotjs	@MikeBishop @LPardue @rmarx @ibnesayeed @pmeenan @Nooshu @gregorywolf @bazzadp	@gregorywolf	Doc	*.sql	Sheet

Content team lead: @dotjs

Welcome chapter contributors! You'll be using this issue throughout the chapter lifecycle to coordinate on the content planning, analysis, and writing stages.

The content team is made up of the following contributors:

New contributors: If you're interested in joining the content team for this chapter, just leave a comment below and the content team lead will loop you in.

Note: To ensure that you get notifications when tagged, you must be "watching" this repository.

Milestones

0. Form the content team

[x] Jul 6th: Project owners have selected an author to be the content team lead
[x] Jul 13th: The content team has at least one author, reviewer, and analyst (minimally viable team formed)

1. Plan content

[x] Jul 20th: The content team has completed the chapter outline in the draft doc
[x] Jul 27th: Analysts have triaged the feasibility of all proposed metrics

2. Gather data

[x] Jul 27th: Analysts have added all necessary custom metrics and drafted a PR to track query progress
Aug 1 - 31: August crawl
[x] Sep 7th: Analysts have queried all metrics and saved the output to the results sheet

3. Validate results

[x] Sep 14th: The content team has reviewed the results sheet

4. Draft content

[x] Nov 12th: Authors have completed the first draft in the doc
[x] Nov 26th: The content team has prototyped all data visualizations

5. Publication

[x] Nov 26th: The content team has reviewed the final draft, converted to markdown, and filed a PR to add it to the 2020 content directory
Dec 9th: Target launch date

tunetheweb commented 4 years ago

Question about what this chapter should be called given that HTTP/3 is now here (even if not quite officially signed off yet). Stick with HTTP/2? Change to HTTP? Should we rename the 2019 chapter (with redirect obviously) or leave as is? Probably best to wait until we've got an author/authors to let them help decide.

rviscomi commented 4 years ago

That's a great question, I'm not sure what the best name is. Agreed, let's see where the content planning takes us and keep the option open to rename the chapter if needed.

Safe to add you as a reviewer for this chapter, Barry?

elithrar commented 4 years ago

My thoughts: frame it as HTTP, and break it down to an acceptable level - accepting that it gets very complex quickly if you look at HTTP/1.1 vs HTTP/2 (streams, prioritization, etc) vs HTTP/3 (QUIC transport, etc).

A lot of the HTTP semantics users & web developers interact with are consistent across versions, and we should start there, with sub-sections for what HTTP/2 and HTTP/3 bring (and why they exist).

tunetheweb commented 4 years ago

Agree with that.

Still considering what to do on last year's chapter. Do we rename it? Gut feel is no, as it was very HTTP/2 focused (with a quick dip into HTTP/3 at the end), even if that does lead to a slight inconsistency in the naming cross year.

I spent a good part of last year's HTTP/2 chapter giving the basics as still think this is a fairly new (even if it was approaching it's 5 years anniversary back then), and little understood technology. Think it would be good to have a similar intro to HTTP/3 this year, and perhaps less on HTTP/2 (we can refer back to the previous year's chapter for that).

However, the main point of the Almanac IMHO is not to act as a reference of the technology (though some background is good, and necessary), but to look at it's usage through the HTTP Archive and help explain that to readers. So need to be conscious not to spend too much time on background/theory. I may have overdone it last year but, as I say, I think it was needed more so than for other chapters given how new the technology is and how niche the expertise is. And given HTTP/3 is even newer, maybe that need is still there this year?

Saying all that, I'm struggling to think what new stats to query for this chapter. But we'll worry about that once we've got authors and reviewers!

And on that subject I'm definitely up for reviewing this year. Can author too if we get really stuck but would prefer to hear from someone new if anyone volunteers! Either way, I'm defintely interested in following how this chapter progresses and to help in anyway I can for it.

rmarx commented 4 years ago

Hello everyone. I'd like to again be a reviewer for this chapter this year. I could also contribute text on HTTP/3 and QUIC concepts if we go that route.

My 2 cents would be that the almanac should indeed focus more on the practical use of the tech seen over the past year, as measured by the HTTP archive runs. From that perspective, there won't be much to discuss on HTTP/3 yet, as few servers and browsers offer it and it's not ready for prime time (though, by the end of the year, it might be a bit more wide-spread).

For this year, you could look at how many sites offer H3 by looking at the alt-svc headers though. You could also look at TLS 1.3 adoption for H2, as this is kind of related to QUIC (or at least could give an indication of how up-to-date backends are). You could also research coalescing (or at least certificate contents) a bit more, as this will stay highly relevant for QUIC (and 0-RTT!) as well (maybe Matt Hobbs could help with that? Given his in-depth waterfall discussion blog posts on this). Finally, an idea of the measured RTTs to the backends would be useful, as that's where QUIC/H3 will provide most benefits.

To actually test H3 down the line, the HTTP archive runs would have to be adapted to also try a (secondary) load over QUIC after the normal H2 (H1?) connection, which might be something to think about @rviscomi (and probably also needs support from @pmeenan, who's been talking about this on twitter a bit as well).

tunetheweb commented 4 years ago

Hello everyone. I'd like to again be a reviewer for this chapter this year. I could also contribute text on HTTP/3 and QUIC concepts if we go that route.

Great stuff!

My 2 cents would be that the almanac should indeed focus more on the practical use of the tech seen over the past year, as measured by the HTTP archive runs. From that perspective, there won't be much to discuss on HTTP/3 yet, as few servers and browsers offer it and it's not ready for prime time (though, by the end of the year, it might be a bit more wide-spread).

Think you'd be surprised with CDNs starting to offer it. It does seem to be growing. Especially if you include gQUIC.

For this year, you could look at how many sites offer H3 by looking at the alt-svc headers though. You could also look at TLS 1.3 adoption for H2, as this is kind of related to QUIC (or at least could give an indication of how up-to-date backends are). You could also research coalescing (or at least certificate contents) a bit more, as this will stay highly relevant for QUIC (and 0-RTT!) as well

Yeah those are the sorts of things I tried to look at last year too. The new author would be well advised to look at the metrics we settled on last year and the discussions around that (#22 )

(maybe Matt Hobbs could help with that? Given his in-depth waterfall discussion blog posts on this).

Ping @nooshu

Finally, an idea of the measured RTTs to the backends would be useful, as that's where QUIC/H3 will provide most benefits.

To actually test H3 down the line, the HTTP archive runs would have to be adapted to also try a (secondary) load over QUIC after the normal H2 (H1?) connection, which might be something to think about @rviscomi (and probably also needs support from @pmeenan, who's been talking about this on twitter a bit as well).

Reminds me of this discussion on trying to measure impact of HTTP/2

ibnesayeed commented 4 years ago

I should be able to review this chapter.

pmeenan commented 4 years ago

I'm happy to help review this chapter.

Nooshu commented 4 years ago

(maybe Matt Hobbs could help with that? Given his in-depth waterfall discussion blog posts on this).

Thanks @rmarx, I'd be happy to help.

tunetheweb commented 4 years ago

Some more thoughts on this chapter:

Last year we concentrated on HTTP/2, with a bit of a mention of HTTP/3. Probably should talk a lot about HTTP/3 this year even if usage might be low.

However last year I almost completely ignored the whole topic of the underlying HTTP semantics. Should we add some more of that this year?

For example, how many HTTP Headers are sent? And what size are they? What's the size of headers compared to bodies on requests and responses? Some headers (e.g. CSP) can be quite large and we're adding new headers like feature-policy and with structured headers this could grow over time. This is of course another benefit of HTTP/2 and HTTP/3 as it has header compression.

What else could we consider along those lines for this year?

Do be aware that some of the other HTTP semantics are covered in other chapters:

Caching and Compression chapters cover the respective headers for these - in fact there's a question as to whether they need full chapters again this year or if they should be collapsed into this chapter?
Number, size and type of requests is captured by Page-Weight chapter (as well as individually in Media, Fonts, CSS and JavaScript chapters).
Security covers HTTPS and Cookies, and this year we may add a dedicated Cookies chapter (or at least talk about them in the new Privacy chapter).

rmarx commented 4 years ago

I do agree it would be interesting to have a discussion on HTTP semantics and things like structured headers (and things that have been going wrong with their practical deployments, cc @yoavweiss).

However, like you say, several newer headers and their impact are discussed elsewhere and cutting-edge stuff like feature policy probably won't show up much this year. We then also should definitely re-name the chapter away from HTTP/2 imo.

As you know, I'm also highly skeptical about the practical impact of HPACK/QPACK for the normal web page loading use case. One area where you'd see improvements would be is with large cookies, but I'm not sure if the current test setup is ideal for measuring those (given that European sites shouldn't be setting cookies on first visit (theoretically) and some high-impact cookies probably only come into play after login/shopping cart stuff). However, this could also be an excellent opportunity to prove me wrong on both counts :) It would probably also unearth some cool/disturbing outliers. Do the WPT results include sizes for compressed headers? If not, we might setup something to run the plaintexts through HPACK and QPACK libraries to compare etc.

tunetheweb commented 4 years ago

However, like you say, several newer headers and their impact are discussed elsewhere and cutting-edge stuff like feature policy probably won't show up much this year. We then also should definitely re-name the chapter away from HTTP/2 imo.

Feature Policy was discussed in security chapter though annoyingly it didn't discussed actually adoption (very small - looks to be about 1,000 sites at most looking at the raw data) and just which options were used when it was deployed. It's probably grown but not by that much. Referrer Policy looks to be used a lot more. Point is use of headers is growing and there is lots of innovation in this space.

As you know, I'm also highly skeptical about the practical impact of HPACK/QPACK for the normal web page loading use case. One area where you'd see improvements would be is with large cookies, but I'm not sure if the current test setup is ideal for measuring those (given that European sites shouldn't be setting cookies on first visit (theoretically) and some high-impact cookies probably only come into play after login/shopping cart stuff). However, this could also be an excellent opportunity to prove me wrong on both counts :) It would probably also unearth some cool/disturbing outliers.

I dunno. Some CSP headers are pretty big! But they're on the response where the files are usually much bigger so maybe you're right.

Do the WPT results include sizes for compressed headers? If not, we might setup something to run the plaintexts through HPACK and QPACK libraries to compare etc.

Discussed last year and not easily available.

tunetheweb commented 4 years ago

Anyone on this thread interested in taking on the Author role? Or suggestions who could?

@elithrar not sure what role you were thinking of and if would be interested in Authoring?

@bagder @dotjs , as last year's other reviewers any interest here? Or suggestions of Authors?

And @Lpardue any further suggestions on this after our chat the other week given your role on QUIC-WG?

dotjs commented 4 years ago

I'm happy to work on this chapter again. Sounds like a few people are interested in providing some content. I'm happy to pull it all together and convincing @LPardue to join in.

paulcalvano commented 4 years ago

@MikeBishop - any interest in co-authoring this chapter?

tunetheweb commented 4 years ago

@siyengar same question to you! 😀

foxdavidj commented 4 years ago

@dotjs just want to confirm that you've reviewed the authoring commitment and the process works for you. Would love to have you as the lead author :)

MikeBishop commented 4 years ago

Yes, I'd be happy to help, as author or reviewer.

LPardue commented 4 years ago

I am willing and able to participate on authoring.

dotjs commented 4 years ago

@dotjs just want to confirm that you've reviewed the authoring commitment and the process works for you. Would love to have you as the lead author :)

reviewed and looks fine to me

gregorywolf commented 4 years ago

Hi. I would sign up for either chapter reviewer or analyst

foxdavidj commented 4 years ago

@dotjs thank you for agreeing to be the lead author for the HTTP2 chapter! As the lead, you'll be responsible for driving the content planning and writing phases in collaboration with your content team, which will consist of yourself as lead, any coauthors you choose as needed, peer reviewers, and data analysts.

The immediate next steps for this chapter are:

Establish the rest of your content team. Several other people were interested or nominated (see below), so that's a great place to start. The larger the scope of the chapter, the more people you'll want to have on board.
Start sketching out ideas in your draft doc.
Catch up on last year's chapter and the project methodology to get a sense for what's possible.

There's a ton of info in the top comment, so check that out and feel free to ping myself or @rviscomi with any questions!

@MikeBishop @LPardue @rmarx @ibnesayeed @pmeenan @Nooshu I've put you down as reviewers for now, and will leave it to @dotjs to reassign at their discretion

@gregorywolf Put you down as both a reviewer and analyst :)

bagder commented 4 years ago

With this massive line-up already signed up I can stand down this year.

foxdavidj commented 4 years ago

Hey @dotjs, hope you had a great weekend.

As you know, we're tying to have the outline and metrics settled on by the end of the week so we have time to configure the Web Crawler to track everything you need. Anything you need from me to keep things moving forward?

Also, can you remind your team to properly add and credit themselves in your chapter's Google Doc?

tunetheweb commented 4 years ago

Added myself as a reviewer. Know we have a lot of them, but feel I deserve my place having written last year's chapter 😀 @dotjs you gonna move some of the reviewers to co-authors? Or taking on the full task yourself?

@gregorywolf happy to help out with Analysis here if you need any help. And the awesome @pmeenan being on team HTTP/2 will undoubtedly help if we have any questions as to what the HTTP Archive crawl currently does (or can!) get!

dotjs commented 4 years ago

Thanks all - Curernt thoughts are to use co-authors. If everyone who has expressed an interst can request edit access to the doc. We can start to plan the content there. Let's focus any potential intersting metrics/measurements that were not part of last years run.

@rmarx @LPardue Keen on your thoughts on what intersting propoerties we can measure for QUIC/H3 etc. @pmeenan I'm personally interested in quantifying the impact of multiple domains/protocols on resource loading. This could include the impact of connection coalescence. Any thoughts on how we can quantify the 'thunderdome' ? How often is H2 prioritisation even relevant ?

pmeenan commented 4 years ago

@dotjs I'm not sure if it will be possible with bigquery but it might be possible with a script that crawls through the raw HAR files on GCS since the data includes chunk timings (and sizes), priority and connection info.

In theory you could check to see how often a higher priority response download is interrupted by chunks for a lower priority response (ignoring some small amount for headers). You could detect broken HTTP/2 prioritization when it happens on the same connection or cross-connection contention when it happens on a separate connection.

We'd have to noodle a bit to think of how that should be represented as a summary metric.

foxdavidj commented 4 years ago

@dotjs How is the outline coming along? Want to get that finished up by the end of the week so we have time to get the Web Crawler setup :)

dotjs commented 4 years ago

Have a first pass at an outline. I'm still not sure about going with HTTP with sections for H2 and H3 as suggested by Matt. I've added some thoughts about other things to discuss with regards HTTP e.g. semantics, DoH and websockets. Any reviewers/authors please add to the doc as I would like as many ideas on what other people would like to see in this chapter as possible. paging @gregorywolf , @Nooshu , @MikeBishop, @ibnesayeed , @bazzadp, @elithrar, @pmeenan and @LPardue

tunetheweb commented 4 years ago

That's pretty comprehensive @dotjs ! Will rack my brains and see if I can think of anything else but can't at the mo...

gregorywolf commented 4 years ago

Hello. I am coming up to speed with the project and specifically my task for the HTTP chapter as an analyst. My goal for this weekend is to finish reviewing all key material. I also want to look at all of the 2019 HTTP SQL queries and start retro fitting them to use on the sample data that @paulcalvano created. I am new to this process so PLEASE direct me as necessary. I look forward to working with the team.

tunetheweb commented 4 years ago

@gregorywolf I am a comment here that might be useful: https://github.com/HTTPArchive/almanac.httparchive.org/issues/914#issuecomment-659205330

And since I’m on this chapter, I’ll update it specifically for this chapter 😀:

Start with the Analysts Guide and set up BigQuery (Good guide on that by our very own @paulcalvano who's leading the Analyst team here on the Web Almanac). Also be aware this can be expensive but there's a generous free tier and Paul will provide credits beyond that for Almanac work. There are also sample tables which are much cheaper to query and it should be difficult to go beyond the free budget with those. Then join the #web-almanac slack and Paul will invite you to the Analysts channel on that.

For this chapter, you can read last year's chapter, look at last year's SQL for this chapter (and the actual results it produced) - both of these are linked at the bottom of the chapter btw. Familiarise yourself with all this, then work with @dotjs and the reviewers to figure out what metrics you want to use this this year and then convert them into queries. Would suggest reusing a lot of last year's queries but also adding some to give a fresh take. Liaise with the other Analysts and @paulcalvano if you have any questions on the data set and what's available. I can also help with this as on this chapter and similarly we’re lucky to have @pmeenan the God of WebPageTest (which is what our crawler uses) on this chapter if any queries on what’s possible or not.

We're planning to run the crawl for the 2020 dataset throughout August so critical point is to quickly figure out and implement any custom metrics required for that crawl before it starts. Would hope there shouldn't be too many (if any) as there is quite a lot of detail in the current dataset and we didn't need any for the HTTP/2 chapter last year. Luckily this chapter deals Mostly with the headers and meta data rather than stuff in the expensive bodies. Thought that may change this year depending on what we want to query.

Hope that helps and gives you something to get started on!

gregorywolf commented 4 years ago

Hi. Quick update. I have updated all of the HTTP 2019 SQL queries. I have not submitted a PR yet. Once the sample_data tables are completed/finalized, I will start testing to make sure the output looks as expected. At that time I will submit a PR. I would be interested to know if anyone has any ideas on what data would be interesting that is above and beyond what was extracted last year.

rviscomi commented 4 years ago

@gregorywolf check out the analyst workflow doc if you haven't already. It may be helpful to create the PR now as a draft, and use it to keep track of metrics already implemented vs those not yet implemented using a markdown checklist. (steps 4 and 5)

Do any of the 2020 queries require custom metrics? (querying the DOM at runtime)

dotjs commented 4 years ago

There are some interesting ideas that may or may require some digging into the HARs. @rviscomi Is there any precedent for this ? For example I'm interested in measuring multiplexing concurrency, concurrent connections etc. @gregorywolf Happy to chat through the metrics whenever you are ready.

rviscomi commented 4 years ago

There are some interesting ideas that may or may require some digging into the HARs. @rviscomi Is there any precedent for this ?

Could you clarify? Not sure if you're asking if any chapter has looked at the HAR data before or only if this is new for the H2 chapter.

foxdavidj commented 4 years ago

@gregorywolf Took a look over the chapter and it looks like we've got most if not all of the data you need. Can you double check though? Only got a little more time left to make changes to the Crawler to collect extra data

gregorywolf commented 4 years ago

@rviscomi Hi. I just submitted a draft PR for the sql 2019 queries formatted to use the sample_data tables.

@dotjs I think talking live would be great. Let's communicate via Slack DM to coordinate.

foxdavidj commented 4 years ago

@dotjs @gregorywolf for the two milestones overdue on July 27 could you check the boxes if:

the outline has been reviewed and all feasible metrics have been identified
any necessary custom metrics have been created and you've created a draft PR to track which feasible metrics have had their queries implemented (we've updated the milestone description to clarify this)

Keeping the milestone checklist up to date helps us to see at a glance how all of the chapters are progressing. Thanks for helping us to stay on schedule!

foxdavidj commented 4 years ago

I've updated the chapter metadata at the top of this issue to link to the public spreadsheet that will be used for this chapter's query results. The sheet serves 3 purposes:

Enable authors/reviewers to analyze the results for each metric without running the queries themselves
Generate data visualizations to be embedded in the chapter
Serve as a public audit trail of this chapter's data collection/analysis, linked from the chapter footer

gregorywolf commented 4 years ago

Hi. I am very close to finalizing all of the queries for the chapter. @rviscomi has been kind enough to run all of my queries so I do not run into a BQ quota issue. I will provide another update once all of the newest results have been generated and transferred to the results Google Sheet

gregorywolf commented 4 years ago

All of the query results are posted in the results sheet. I have created pivot tables for all of the tabs. Please take a look at the data and provide feedback. I made a decision to NOT filter out any key fields that contain blanks. I will leave the filtering to the author

MikeBishop commented 4 years ago

What's the best venue to provide feedback? Here on the issue, comments in the results sheet, etc.?

tunetheweb commented 4 years ago

Personally I’d prefer it here. Or at the very least an “FYI I’ve made a comment on tabs 1, 2 and 5” type comment on this issue.

MikeBishop commented 4 years ago

First off, thanks for the work you've already put in. This is an immense amount of data to digest, and you've clearly put in a lot of work slicing it into interpretable chunks.

For all of these, the pivot tables you mentioned would be useful to slice things, but I'm not able to actually filter anything in the sheet itself; I'm wondering if that's because I don't have edit access to the sheet? But I can copy the sheet and add filter views, it looks like.

Here's my first pass through the different pages:

Adoption of H2 tab: How do we interpret the blank outcome? I don't want to just discard nearly 4% of requests, but it's not clear that it directly maps to any of the other versions, since they are represented.
Grouped by server:
- Same about the blanks, but it's more sensible here as some servers don't include that header.
- I wonder about spinning these two tabs together, to see whether there are trends of servers more or less likely to serve HTTP/2. I imagine that, exempting those which simply don't implement HTTP/2, it would turn into a statement about default-on vs. default-off.
Alt-Svc headers: I think it would be more useful to break these down into what percentage offer certain things in Alt-Svc, rather than just the discrete header values. (Though I'm very surprised there are enough instances to gather any appreciable percentages on a specific value; when I did a similar query a few years ago, I found that "clear" was the only thing that had enough consistency for that.) For example:
- Percentage that are "clear", the only defined keyword for this header, which we can already see from this table
- Percentage that offer h2
- Percentage that offer various QUIC/H3 versions
- Percentage that refer to same/different host or port
- Distribution of ma values
- How many alternatives per protocol? How many different protocols?
For the Upgrade header, I'd like the ability to filter those by HTTP/HTTPS. Upgrading to h2c is only supposed to be offered on clear-text connections, but a recent article pointed out that some servers that support it will still do the Upgrade within an HTTP/1.1 TLS connection (presumably because something else is terminating TLS and the server sees it as a clear-text connection).
I'm more than a little surprised by the number of HTTP/2 connections returning the Upgrade header. That's... supposed to be illegal. Not feedback on the presentation of the data, just... interesting. Thanks for including that.
Percentage loaded over HTTP: Should I read this as percentage of resources on a page loaded over cleartext, given the protocol used for the base page?
TLS version by HTTP version: What does blank mean here? I assume that we're not considering cleartext HTTP/2, so it's not "no TLS" for that. The sampled QUIC versions are presumably using Google Crypto, so the advertisement of any TLS version is interesting, even though small.

rviscomi commented 4 years ago

Chiming in to give a couple of unsolicited Sheets tips: don't hesitate to request edit access if it'd help you explore the data, and change the default notification settings from "Only Yours" to "All" to be emailed on all comments even if you're not explicitly mentioned.

tunetheweb commented 4 years ago

@MikeBishop , I can answer some of these based on experience last year as author and person who came up with a lot of these stat requests, and investigations I did on some of the same questions on last years stats.

For all of these, the pivot tables you mentioned would be useful to slice things, but I'm not able to actually filter anything in the sheet itself; I'm wondering if that's because I don't have edit access to the sheet? But I can copy the sheet and add filter views, it looks like.

Could be. Could you request edit permission to see?

Here's my first pass through the different pages:

Adoption of H2 tab: How do we interpret the blank outcome? I don't want to just discard nearly 4% of requests, but it's not clear that it directly maps to any of the other versions, since they are represented.

We had the same last year and investing showed these to be mostly HTTP/1.1:

Annoyingly, there is a larger percentage where the protocol was not correctly tracked by the HTTP Archive crawl, particularly on desktop. Digging into this has shown various reasons, some of which can be explained and some of which can't. Based on spot checks, they mostly appear to be HTTP/1.1 requests and, assuming they are, desktop and mobile usage is similar.

It's a similar result this year - desktop is ~4% short of mobile and we have ~4% uncategorised.

Even better news, is I spent some time on this after (cause it bugged me to!) and figured out why this is the case and fixed it - unfortunately too late for this year's Almanac month (August) but we can look at October data to confirm this just before we go live. From the work on that fix we know the "protocol" is not always set for HTTP/1.1 and the parsing to try to pull it out from the request and response was broken. I'm pretty confident the vast majority is HTTP/1.1 and think we should assume this, explain it like I did last year, and quickly double check it after the October run to confirm.

Grouped by server:

Same about the blanks, but it's more sensible here as some servers don't include that header.

I wonder about spinning these two tabs together, to see whether there are trends of servers more or less likely to serve HTTP/2. I imagine that, exempting those which simply don't implement HTTP/2, it would turn into a statement about default-on vs. default-off.

Some interesting stats and discussion on that last year. @gregorywolf I added client to some of the pivot tables as percentages were wrong without them as adding up (unless Apache really is 95% of server usage 😁)

Alt-Svc headers: I think it would be more useful to break these down into what percentage offer certain things in Alt-Svc, rather than just the discrete header values. (Though I'm very surprised there are enough instances to gather any appreciable percentages on a specific value; when I did a similar query a few years ago, I found that "clear" was the only thing that had enough consistency for that.) For example:

Percentage that are "clear", the only defined keyword for this header, which we can already see from this table

Percentage that offer h2

Percentage that offer various QUIC/H3 versions

Percentage that refer to same/different host or port

Distribution of ma values

How many alternatives per protocol? How many different protocols?

For the Upgrade header, I'd like the ability to filter those by HTTP/HTTPS. Upgrading to h2c is only supposed to be offered on clear-text connections, but a recent article pointed out that some servers that support it will still do the Upgrade within an HTTP/1.1 TLS connection (presumably because something else is terminating TLS and the server sees it as a clear-text connection).

That's why I'm a fan of giving the raw data and letting authors/reviewers slice and dice as they see it in the spreadsheet! Though can revert to SQL if easier once we know what we want. After digging into the data we should decide what stats are interesting and so what to include in the chapter and in what format.

I'm more than a little surprised by the number of HTTP/2 connections returning the Upgrade header. That's... supposed to be illegal. Not feedback on the presentation of the data, just... interesting. Thanks for including that.

Again good discussion on this last year - which is where a lot of these queries came from. Will be interesting to see if it's better or worse than last year.

Percentage loaded over HTTP: Should I read this as percentage of resources on a page loaded over cleartext, given the protocol used for the base page?

Sorry don't understand your question or what you are talking bout cleartext. Is this "percentage_of_resources_loaded_over_HTTP_by_version_per_site" tab? That's any HTTP version regardless of HTTPS status.

TLS version by HTTP version: What does blank mean here? I assume that we're not considering cleartext HTTP/2, so it's not "no TLS" for that. The sampled QUIC versions are presumably using Google Crypto, so the advertisement of any TLS version is interesting, even though small.

Yes we should dig into this more. Suspect it's QUIC and TLS version is not being recorded correctly, but that's a guess. This is a new stat for this year btw so nothing to compare on this last year. There's a lot but Google does account for a lot of traffic when looking at request level (between Google Analytics, Ads and Marketing tags, YouTube, Google Fonts..etc.) so it's possible. Definitely one to dig into @gregorywolf .

MikeBishop commented 4 years ago

Percentage loaded over HTTP: Should I read this as percentage of resources on a page loaded over cleartext, given the protocol used for the base page?

Sorry don't understand your question or what you are talking bout cleartext. Is this "percentage_of_resources_loaded_over_HTTP_by_version_per_site" tab? That's any HTTP version regardless of HTTPS status.

"Percentage of resources loaded over HTTP" as opposed to what? That is, where the number is less than 100% loaded over HTTP, what were the other resources loaded over? I could read this as HTTP vs. HTTPS, same versus different version used for subresources, network vs. cache, references to data: URLs that don't hit the network, etc.

Or it's something totally different and I'm having a total mental disconnect figuring out what this query is measuring.

tunetheweb commented 4 years ago

Ah gotcha now. Yeah I don't understand this stat either. Would expect each line to add up to 100%, so we have for example 30% HTTP/1.1 and 70% HTTP/2. @gregorywolf ?

gregorywolf commented 4 years ago

All. I have been away for a bunch of days and am just getting back on line. I will take a look at the above comments and comment in the next few days.

HTTPArchive / almanac.httparchive.org

HTTP/2 2020 #921

Part IV Chapter 22: HTTP/2

Content team

Milestones

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication