HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
602 stars 164 forks source link

Media 2024 #3596

Open nrllh opened 4 months ago

nrllh commented 4 months ago

Media 2024

Media illustration

If you're interested in contributing to the Media chapter of the 2024 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor. You might be interested in exploring the changes to this year's version here.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@stefanjudis @stefanjudis @svgeesus, @nhoizey, @eeeps @foolip, @nucliweb, @eeeps @MichaelLewittes @turban1988
Expand for more information about each role πŸ‘€ - The **[content team lead](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Content-Team-Leads'-Guide)** is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress. - **[Authors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Authors'-Guide)** are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report. - **[Reviewers](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Reviewers'-Guide)** are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. - **[Analysts](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide)** are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly. - **[Editors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide)** are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit. - The **[section coordinator](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Section-Leads'-Guide)** is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule. _Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors._ For an overview of how the roles work together at each phase of the project, see the [Chapter Lifecycle](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Chapter-Lifecycle) doc.

Milestone checklist

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

6. Virtual conference

Chapter resources

Refer to these 2024 Media resources throughout the content creation process: πŸ“„ Google Docs for outlining and drafting content πŸ” SQL files for committing the queries used during analysis πŸ“Š Google Sheets for saving the results of queries πŸ“ Markdown file for publishing content and managing public metadata πŸ’» Collab notebook for collaborative coding in Python - if needed πŸ’¬ #web-almanac-media on Slack for team coordination

MichaelLewittes commented 4 months ago

I'd be happy to be the editor of the media section again. Know the drill -- and ready to do it once more.

foolip commented 4 months ago

I'm interested in contributing as an analyst.

I've been poking around at the question "what sizes and qualities do images on the web tend to be?" I've been running some of the queries in https://almanac.httparchive.org/en/2022/media again and have identified what I think are a few interesting angles.

First, the distribution of images sizes is much more uneven and interesting than https://almanac.httparchive.org/en/2022/media#image-dimensions suggests. Here's a histogram from a quick experiment I did in February:

image

Instead of megapixels I'm using sqrt(megapixels), so the equivalent width of a square image. I found this much easier to reason about, since much of the interesting action is in the 0-0.25 megapixel range. 300x300 images are the most common.

Second, BPP (bits/pixel) strongly depends on image size, with smaller images having higher BPP. The reasons I can see are (1) container overhead (2) more incentive to compress large images and (3) less detail in large images, as many small images are downscaled versions of the large ones.

I think it would be interesting to try to understand quality both through BPP while taking these effects into account, but also by estimating the encoder settings used. I suspect the latter varies less with size, and at least from JPEG an estimation is possible due to how the format works. A first attempt yielded this:

image

I also shared this in https://github.com/HTTPArchive/almanac.httparchive.org/issues/3572#issuecomment-1943549499 and there are some words of caution about using ImageMagick's detected quality, but I think something useful could be done here.

foolip commented 4 months ago

A colleague made this useful observation:

noting that 300x300 is the default (medium) image size in WordPress and 82 is the default quality. This lines up exactly with the "most common" size and quality. Since WordPress sites are a large part of the dataset (~30%) they may be influencing the results. It might be interesting to see what images on non WordPress sites looks like.

svgeesus commented 4 months ago

I noticed this in the 2022 Media report

One caveat: AVIF and PNG allow tagging images with wide-gamut color spaces using format-specific shorthands, without using ICC profiles. We started down the path of trying to detect wide-gamut AVIFs and PNGs that don’t use ICC profiles, but accounting for the various ways they are encodedβ€”and the ways our tooling reported on themβ€”proved a bit too complex to tackle this year. Maybe next year!

Coding Independent Code Points (CICP) is a simple to understand and use method, originally from the broadcast and video world, also applicable to still images and short animations.

Given that:

Then the "various ways they are encoded" becomes a much more tractable "look for CICP in images" and I suggest this metric for the 2024 Media survey.

Originally raised in

svgeesus commented 4 months ago

I volunteer as tribute as a reviewer

nucliweb commented 3 months ago

Hi, I would love to contribute as an analyst.

nrllh commented 3 months ago

Hey @eeeps @akshay-ranganath @nhoizey @yoavweiss @MichaelLewittes - awesome contributors from previous years πŸ™‚ Are you interested in joining us again this year?

MichaelLewittes commented 3 months ago

Would be honored to join again as the editor.

On Tue, Apr 9, 2024 at 6:38β€―PM Nurullah Demir @.***> wrote:

Hey @eeeps https://github.com/eeeps @akshay-ranganath https://github.com/akshay-ranganath @nhoizey https://github.com/nhoizey @yoavweiss https://github.com/yoavweiss @MichaelLewittes https://github.com/MichaelLewittes - awesome contributors from previous years πŸ™‚ Are you interested in joining us again this year?

β€” Reply to this email directly, view it on GitHub https://github.com/HTTPArchive/almanac.httparchive.org/issues/3596#issuecomment-2046153531, or unsubscribe https://github.com/notifications/unsubscribe-auth/AW6KSXJHEQFVMNCEFMKQCPLY4RUXTAVCNFSM6AAAAABEDJBJZ6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBWGE2TGNJTGE . You are receiving this because you were mentioned.Message ID: @.***>

nhoizey commented 3 months ago

Hi @nrllh, I can indeed join this year once again, as a reviewer.

eeeps commented 3 months ago

@nrllh I can join as an Analyst and Reviewer, but do not have the bandwidth to Lead or Author again this year.

nrllh commented 3 months ago

thank you, @MichaelLewittes, @nhoizey, @eeeps!

turban1988 commented 2 months ago

Hi @rey-dal, Thank you very much for volunteering to lead the writing of this chapter! Could you please organize a kick-off meeting for this chapter (example: https://github.com/HTTPArchive/almanac.httparchive.org/issues/3603#issuecomment-2064351177) to organize the writing of the chapter?

Furthermore, it would be helpful if you and all other contributors (@svgeesus, @nhoizey, @eeeps , @foolip,@nucliweb,@MichaelLewittes) could join the slack channel of the HTTPArchive (https://join.slack.com/t/httparchive/shared_invite/zt-2hfkn28ts-~uXN4UGS0mXsKpzzhtZcow)

Thanks!

scottjehl commented 1 month ago

I'd love to see included in this year's report if there's been any uptick in responsive video usage now that support has returned across browsers. That is, how many sites are using video source elements with media attributes and what sizes are they commonly serving? Happy to help if there's any way I can!

nrllh commented 3 weeks ago

Unfortunately, we currently have no authors for this chapter. Is anyone of you (@svgeesus, @nhoizey, @eeeps, @foolip, @nucliweb, @eeeps, @MichaelLewittes) interested in contributing to this chapter as an author?

svgeesus commented 3 weeks ago

Unfortunately I am overcommitted right now, so can't take this on.

stefanjudis commented 1 week ago

@svgeesus, @nhoizey, @eeeps, @foolip, @nucliweb, @MichaelLewittes, @turban1988

Hi friends! πŸ‘‹ I am very excited about this and looking forward to collaborating with all of you! As I know, we're a bit late with the original deadline, so let's kick things off quickly (if possible).

To start off, I would like to schedule a 30-60 minute meeting to start the planning and brainstorming process. So please provide your availability here for the next two weeks: https://doodle.com/meeting/participate/id/erMwP4kd

I checked the present timezones and choose options in the european evening that should work for the US.

Also, here is an agenda for what we might want to discuss on the kickoff call: https://docs.google.com/document/d/11lk8wSjs9PQXlWhv1FYeDynhrBxQNOUza85fjC5oB1k/edit (please request access). Feel free to add points β€” because I haven't led any Web Almanac related activities so far. πŸ˜…

The goal of the meeting will be to quickly get to know each other, set new deadlines and define our preferred workflow.

Speaking of the chapter content: I'll summarize all statistics and data points from the previous year in the gdoc above. Ideally, you could give some thought on which metrics you'd like to drop / adjust but also what new things we should add. :)

Also a gentle reminder to join the #web-almanac-media channel on Slack (https://join.slack.com/t/httparchive/shared_invite/zt-2lx22qow3-pkcEJltSqtyP9_86V4uTZQ)

nhoizey commented 1 week ago

Thanks @stefanjudis for taking the lead! πŸ™

stefanjudis commented 5 days ago

@nhoizey @eeeps @nucliweb @MichaelLewittes

Thank you for filling out the doodle. We had a very clear winner and it's next week Thursday 11 AM GTM - 7 (SF) / 2PM GTM -4 (NYC) / 8 PM GMT+2 (Berlin). πŸŽ‰

@svgeesus @foolip @turban1988

If you want to join please let me know and I also invite you. :)


The folks joining the Kick-off call already have access to a living document that we'll use going forward. If someone wants to have access, too, just request it via Google. :)

Looking forward to catching up with you all!

svgeesus commented 4 days ago

I just filled out the doodle