Closed DavidLerner closed 10 years ago
The size of all requests, as well as the HTTP method (eg, "GET") is available in the data dumps.
This seems like a useful chart but perhaps not broadly applicable. An alternative to adding a permanent chart to the HTTP Archive is to do this in BigQuery. The query would be:
SELECT
NTH(10, quantiles(respSize,100)) tenth,
NTH(20, quantiles(respSize,100)) twentieth,
NTH(30, quantiles(respSize,100)) thirtieth,
NTH(40, quantiles(respSize,100)) fortieth,
NTH(50, quantiles(respSize,100)) fiftieth,
NTH(60, quantiles(respSize,100)) sixtieth,
NTH(70, quantiles(respSize,100)) seventieth,
NTH(80, quantiles(respSize,100)) eightieth,
NTH(90, quantiles(respSize,100)) ninetieth
FROM [httparchive:runs.latest_requests];
The results from the latest crawl are:
Steve, This is enormously helpful. Thank you very much for your prompt response. The sizes are a lot bigger than I had expected them to be. We had been working under the impression that the median was closer to 1300 bytes. I wonder what is driving the large size? Cookies alone wouldn’t make them this big.
I’m assuming that your query excludes posts?
David
From: Steve Souders [mailto:notifications@github.com] Sent: Tuesday, June 03, 2014 11:11 PM To: HTTPArchive/httparchive Cc: Lerner, David Subject: Re: [httparchive] Request header size distribution (#22)
The size of all requests, as well as the HTTP method (eg, "GET") is available in the data dumps.
This seems like a useful chart but perhaps not broadly applicable. An alternative to adding a permanent chart to the HTTP Archive is to do this in BigQuery. The query would be:
SELECT
NTH(10, quantiles(respSize,100)) tenth, NTH(20, quantiles(respSize,100)) twentieth, NTH(30, quantiles(respSize,100)) thirtieth, NTH(40, quantiles(respSize,100)) fortieth, NTH(50, quantiles(respSize,100)) fiftieth, NTH(60, quantiles(respSize,100)) sixtieth, NTH(70, quantiles(respSize,100)) seventieth, NTH(80, quantiles(respSize,100)) eightieth, NTH(90, quantiles(respSize,100)) ninetieth FROM [httparchive:runs.latest_requests];
The results from the latest crawl are: [image]https://urldefense.proofpoint.com/v1/url?u=https://cloud.githubusercontent.com/assets/2819380/3169200/be61c6da-eb95-11e3-8705-5dd555fa0b91.png&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=2Jlz%2FoVepstx8zesrte197BusLdFusOY7XR%2BWmGjmfM%3D%0A&m=TgbBHFOdayiXWmpyhkqnuGXkI%2BSgk27KhGDC28uy3OQ%3D%0A&s=fcb9c04c3c9218cca8847edf3eaf47a33666a7041748f484312c99e562c85b7d
— Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v1/url?u=https://github.com/HTTPArchive/httparchive/issues/22%23issuecomment-45047465&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=2Jlz%2FoVepstx8zesrte197BusLdFusOY7XR%2BWmGjmfM%3D%0A&m=TgbBHFOdayiXWmpyhkqnuGXkI%2BSgk27KhGDC28uy3OQ%3D%0A&s=acde246365abc472d317f3859324c82de4750320662755f91a79111f321d1b26.
respSize is the response, not the requestHeaders I assume. reqHeadersSize looks like it's the one you're interested in but as best as I can tell it's not getting populated (all appear to be NULL). That won't include the length of the GET line (with the URL) though - the "bytesOut" metric from WPT for a given request would be optimal.
Short-term, fixing reqHeadersSize should get you most of the way there.
Yes, that will help. Ideally, the metric would include URL size as well, as URLs are getting longer and longer these days.
From: Patrick Meenan [mailto:notifications@github.com] Sent: Wednesday, June 04, 2014 8:44 AM To: HTTPArchive/httparchive Cc: Lerner, David Subject: Re: [httparchive] Request header size distribution (#22)
respSize is the response, not the requestHeaders I assume. reqHeadersSize looks like it's the one you're interested in but as best as I can tell it's not getting populated (all appear to be NULL). That won't include the length of the GET line (with the URL) though - the "bytesOut" metric from WPT for a given request would be optimal.
Short-term, fixing reqHeadersSize should get you most of the way there.
— Reply to this email directly or view it on GitHubhttps://urldefense.proofpoint.com/v1/url?u=https://github.com/HTTPArchive/httparchive/issues/22%23issuecomment-45085387&k=OWT%2FB14AE7ysJN06F7d2nQ%3D%3D%0A&r=2Jlz%2FoVepstx8zesrte197BusLdFusOY7XR%2BWmGjmfM%3D%0A&m=FxLtCvE0vKDsGRFY4ZBjjWVuSkVQVXQxmzMJvSdYEwo%3D%0A&s=fd7596541a9e0b047a09a08a68ca35acaf4bc75aa58bd489200601d5a4a4ddad.
@pmeenan did the patch over on webpagetest; can this get updated for the next run? Really want these numbers in bigqueri.es to help make some decisions in HTTP/2...
The HAR export on the HTTPArchive WPT instance was updated. If the HTTPArchive code is already pulling the field then it should "just work" (fingers crossed).
I verified that this is being correctly recorded in the Jul 15 2014 crawl.
Excellent! Thanks much. Now just waiting for them to show up on bigqueri.es :)
Please track the size of GET requests.
When designing algorithms for optimizing MAC layer utilization, it is helpful to understand the size of HTTP GET requests (average is useful; CDF is much more useful).