HTTPArchive / legacy.httparchive.org

<<THIS REPOSITORY IS DEPRECATED>> The HTTP Archive provides information about website performance such as # of HTTP requests, use of gzip, and amount of JavaScript. This information is recorded over time revealing trends in how the Internet is performing. Built using Open Source software, the code and data are available to everyone allowing researchers large and small to work from a common base.
https://legacy.httparchive.org
Other
328 stars 84 forks source link

Request header size distribution #22

Closed DavidLerner closed 10 years ago

DavidLerner commented 10 years ago

Please track the size of GET requests.

When designing algorithms for optimizing MAC layer utilization, it is helpful to understand the size of HTTP GET requests (average is useful; CDF is much more useful).

stevesouders commented 10 years ago

The size of all requests, as well as the HTTP method (eg, "GET") is available in the data dumps.

This seems like a useful chart but perhaps not broadly applicable. An alternative to adding a permanent chart to the HTTP Archive is to do this in BigQuery. The query would be:

SELECT
NTH(10, quantiles(respSize,100)) tenth, NTH(20, quantiles(respSize,100)) twentieth, NTH(30, quantiles(respSize,100)) thirtieth, NTH(40, quantiles(respSize,100)) fortieth, NTH(50, quantiles(respSize,100)) fiftieth, NTH(60, quantiles(respSize,100)) sixtieth, NTH(70, quantiles(respSize,100)) seventieth, NTH(80, quantiles(respSize,100)) eightieth, NTH(90, quantiles(respSize,100)) ninetieth FROM [httparchive:runs.latest_requests];

The results from the latest crawl are: image

DavidLerner commented 10 years ago

Steve, This is enormously helpful. Thank you very much for your prompt response. The sizes are a lot bigger than I had expected them to be. We had been working under the impression that the median was closer to 1300 bytes. I wonder what is driving the large size? Cookies alone wouldn’t make them this big.

I’m assuming that your query excludes posts?

David

From: Steve Souders [mailto:notifications@github.com] Sent: Tuesday, June 03, 2014 11:11 PM To: HTTPArchive/httparchive Cc: Lerner, David Subject: Re: [httparchive] Request header size distribution (#22)

The size of all requests, as well as the HTTP method (eg, "GET") is available in the data dumps.

This seems like a useful chart but perhaps not broadly applicable. An alternative to adding a permanent chart to the HTTP Archive is to do this in BigQuery. The query would be:

SELECT

NTH(10, quantiles(respSize,100)) tenth, NTH(20, quantiles(respSize,100)) twentieth, NTH(30, quantiles(respSize,100)) thirtieth, NTH(40, quantiles(respSize,100)) fortieth, NTH(50, quantiles(respSize,100)) fiftieth, NTH(60, quantiles(respSize,100)) sixtieth, NTH(70, quantiles(respSize,100)) seventieth, NTH(80, quantiles(respSize,100)) eightieth, NTH(90, quantiles(respSize,100)) ninetieth FROM [httparchive:runs.latest_requests];

The results from the latest crawl are: [image]https://urldefense.proofpoint.com/v1/url?u=https://cloud.githubusercontent.com/assets/2819380/3169200/be61c6da-eb95-11e3-8705-5dd555fa0b91.png&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=2Jlz%2FoVepstx8zesrte197BusLdFusOY7XR%2BWmGjmfM%3D%0A&m=TgbBHFOdayiXWmpyhkqnuGXkI%2BSgk27KhGDC28uy3OQ%3D%0A&s=fcb9c04c3c9218cca8847edf3eaf47a33666a7041748f484312c99e562c85b7d

— Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v1/url?u=https://github.com/HTTPArchive/httparchive/issues/22%23issuecomment-45047465&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=2Jlz%2FoVepstx8zesrte197BusLdFusOY7XR%2BWmGjmfM%3D%0A&m=TgbBHFOdayiXWmpyhkqnuGXkI%2BSgk27KhGDC28uy3OQ%3D%0A&s=acde246365abc472d317f3859324c82de4750320662755f91a79111f321d1b26.

pmeenan commented 10 years ago

respSize is the response, not the requestHeaders I assume. reqHeadersSize looks like it's the one you're interested in but as best as I can tell it's not getting populated (all appear to be NULL). That won't include the length of the GET line (with the URL) though - the "bytesOut" metric from WPT for a given request would be optimal.

Short-term, fixing reqHeadersSize should get you most of the way there.

DavidLerner commented 10 years ago

Yes, that will help. Ideally, the metric would include URL size as well, as URLs are getting longer and longer these days.

From: Patrick Meenan [mailto:notifications@github.com] Sent: Wednesday, June 04, 2014 8:44 AM To: HTTPArchive/httparchive Cc: Lerner, David Subject: Re: [httparchive] Request header size distribution (#22)

respSize is the response, not the requestHeaders I assume. reqHeadersSize looks like it's the one you're interested in but as best as I can tell it's not getting populated (all appear to be NULL). That won't include the length of the GET line (with the URL) though - the "bytesOut" metric from WPT for a given request would be optimal.

Short-term, fixing reqHeadersSize should get you most of the way there.

— Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v1/url?u=https://github.com/HTTPArchive/httparchive/issues/22%23issuecomment-45085387&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=2Jlz%2FoVepstx8zesrte197BusLdFusOY7XR%2BWmGjmfM%3D%0A&m=FxLtCvE0vKDsGRFY4ZBjjWVuSkVQVXQxmzMJvSdYEwo%3D%0A&s=fd7596541a9e0b047a09a08a68ca35acaf4bc75aa58bd489200601d5a4a4ddad.

mnot commented 10 years ago

@pmeenan did the patch over on webpagetest; can this get updated for the next run? Really want these numbers in bigqueri.es to help make some decisions in HTTP/2...

pmeenan commented 10 years ago

The HAR export on the HTTPArchive WPT instance was updated. If the HTTPArchive code is already pulling the field then it should "just work" (fingers crossed).

stevesouders commented 10 years ago

I verified that this is being correctly recorded in the Jul 15 2014 crawl.

mnot commented 10 years ago

Excellent! Thanks much. Now just waiting for them to show up on bigqueri.es :)