HTTPArchive / almanac.httparchive.org

HTTP Archive's annual "State of the Web" report made by the web community
https://almanac.httparchive.org
Apache License 2.0
615 stars 176 forks source link

CMS 2024 #3608

Closed nrllh closed 5 days ago

nrllh commented 8 months ago

CMS 2024

CMS illustration

If you're interested in contributing to the CMS chapter of the 2024 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor. You might be interested in exploring the changes to this year's version here.

Content team

Lead Authors Reviewers Analysts Editors Coordinator
@sirjonathan @sirjonathan, @LoraRaykova, @niko-kaleev @raewrites, @karmatosed @sirjonathan, @nrllh - @turban1988
Expand for more information about each role πŸ‘€ - The **[content team lead](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Content-Team-Leads'-Guide)** is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress. - **[Authors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Authors'-Guide)** are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report. - **[Reviewers](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Reviewers'-Guide)** are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. - **[Analysts](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide)** are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly. - **[Editors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide)** are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit. - The **[section coordinator](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Section-Leads'-Guide)** is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule. _Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors._ For an overview of how the roles work together at each phase of the project, see the [Chapter Lifecycle](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Chapter-Lifecycle) doc.

Milestone checklist

0. Form the content team

1. Plan content

2. Gather data

3. Validate results

4. Draft content

5. Publication

6. Virtual conference

Chapter resources

Refer to these 2024 CMS resources throughout the content creation process: πŸ“„ Google Docs for outlining and drafting content πŸ” SQL files for committing the queries used during analysis πŸ“Š Google Sheets for saving the results of queries πŸ“ Markdown file for publishing content and managing public metadata πŸ’» Collab notebook for collaborative coding in Python - if needed πŸ’¬ #web-almanac-cms on Slack for team coordination

sirjonathan commented 8 months ago

I'm happy that the project is back again! I'd love to return and contribute as either author or co-author of this year's chapter.

nrllh commented 7 months ago

Hey @alexdenning @dknauss @alonkochba @honzasladek @csliva @dknauss - awesome contributors from previous years πŸ™‚ Are you interested in joining us again this year?

raewrites commented 7 months ago

I'm interested in reviewing. I'm an experienced writer and editor and have previously worked with @sirjonathan. I'll be away most of September but will be back on the 27th, so it would be good to co-review with someone else.

karmatosed commented 7 months ago

I am also interested in reviewing. I have experience within the area and can be a subject matter reviewer along with having worked with @sirjonathan to aid collaboration.

turban1988 commented 6 months ago

Hi @sirjonathan, Thank you very much for volunteering to lead the writing of this chapter! Could you please organize a kick-off meeting for this chapter (example: https://github.com/HTTPArchive/almanac.httparchive.org/issues/3603#issuecomment-2064351177) to organize the writing of the chapter?

Furthermore, it would be helpful if you and all other contributors (@LoraRaykova, @niko-kaleev, @raewrites @karmatosed ) could join the slack channel of the HTTPArchive (https://join.slack.com/t/httparchive/shared_invite/zt-2hfkn28ts-~uXN4UGS0mXsKpzzhtZcow)

Thanks!

sirjonathan commented 6 months ago

@turban1988 I've reached out to the team and am planning to hold the kickoff meeting next week.

sirjonathan commented 6 months ago

@LoraRaykova, @niko-kaleev, and I met today. We discussed the previous years efforts and ideas for improving / expanding this year's chapter, including references to the Speculation Rules API and tracking themes within the WordPress section.

Our plan is to start by pulling over the 2022 outline and expand it with our ideas for this year. @niko-kaleev will take the first pass at that and we'll work on it together async.

We're meeting again on the 28th to finalize the outline after which I'll follow-up on any analyst related tasks.

/cc @turban1988

niko-kaleev commented 6 months ago

As discussed with @sirjonathan and @LoraRaykova on the kick-off meeting, the chapter outline is ready:

https://docs.google.com/document/d/13CxAp7HCcxHHCSuEnXS2rolKskLSlUvLQuqUD6QADYc/edit

@sirjonathan will review it next week, and we'll finalize it on the 28th.

/cc @turban1988

sirjonathan commented 5 months ago

@Niko Kaleev @Lora Raykova and I met on Tuesday for a sync. The outline is in good shape and they're going to start on the parts of the chapter they can while we wait for data.

@nrllh has generously agreed to tackle the analyst work and replicate the analysis from the 2022 edition.

sirjonathan commented 4 months ago

@niko-kaleev and I met today for a quick check-in. We discussed next steps and scheduled a follow-up for August 20, once we have results validated.

dknauss commented 2 months ago

@sirjonathan Do you still need a hand? I have time to help if you're still working through the reviewing and editing.

niko-kaleev commented 2 months ago

You can find the first draft at the bottom of the document: https://docs.google.com/document/d/13CxAp7HCcxHHCSuEnXS2rolKskLSlUvLQuqUD6QADYc/edit

What still needs to be done:

  1. @sirjonathan to write the Page Builders section (under WordPress 2024), as discussed during our kick-off meeting
  2. @sirjonathan to draw and write the conclusion

Keep in mind that a lot of the data presented in 2022 was missing from this year's data export. All the missing data points are highlighted in red or in a comment.

@sirjonathan, would you mind extracting the missing data points and writing a paragraph or two for each? We believe that will make the whole chapter even more valuable to our readers.

sirjonathan commented 2 months ago

@niko-kaleev Thank you for the update! I'm planning to set aside some time during WordCamp US next week to work on 1 and 2. Regarding the missing data points, I'll take another look and see what's possible - I just may need technical help.

sirjonathan commented 2 weeks ago

First off, a huge thanks to @niko-kaleev for all the heavy lifting that got us here. Here's a quick summary of what I've completed over the past few days:

Pending Decisions

There are 3 sections that I'd like input on before we finalize:

CMS Adoption share (Google Doc link) - This was something new we tried this year. On closer inspection at the data, I realized that it's a combination of desktop and mobile that while it could be interesting, according to our methodology there's a lot of overlap in the dataset as "most websites are included in both the mobile and desktop subsets." . It feels like it could be misleading without clarity and upon clarifying it doesn't feel that useful.

My recommendation is that we cut this section.

CMS Adoption by geography (Google Doc link) - This section feels a bit messy, but with some cleanup could be interesting and useful o include.

A few items I noticed:

My recommendation is that if we can get the US and UK added back to the results that we do so and clean up the section as outlined above.

CMS Adoption by rank (Google Doc Link) - I question how useful this section is as a whole. "rank" is subjective and, presumably is a reference to Google's index, but isn't introduced as such. Also, if I understand the spreadsheet right, "All" actually means the top 10M, not every site in the index, which is confusing. Last, the current analysis is focused on "headless" as an explanation in the jump whereas I think a more plausible explanation is the fact that our dataset nearly doubled in size from 2022 to 2024.

My recommendation is that unless others feel strongly about this section that we go ahead and cut it.

Next Steps

I'd love feedback on the above items over the next few days. From there, once @raewrites and @dknauss have given their all clear on reviewing, we can move forward to publishing.

@nrllh can you or someone else tackle the markdown generation and prepare the PR?

LoraRaykova commented 2 weeks ago

Hi Jonathan,

Thanks for the summary and update! I just wanted to quickly note that both Niko and I worked on this chapter equally as half of the draft was written and analyzed by me.

sirjonathan commented 2 weeks ago

Hi Lora, of course! My mistake. Thank you for clarifying and for all your work on it.

tunetheweb commented 2 weeks ago

"by geography" seems confusing, I suggest we update it to "by country"

β€œCountry” causes some people some concerns based on certain geo-political areas (e.g. Taiwan). Hence we say by β€œgeographical reason”.

CMS Adoption by rank (Google Doc Link) - I question how useful this section is as a whole. "rank" is subjective and, presumably is a reference to Google's index, but isn't introduced as such.

Rank refers to popularity by Chrome page views in the previous month, rather than any SEO ranking.

IMHO (and without having read the draft), this is useful to see the split. E.g. top ranked sites for more commercial sites typically can afford paid-for CMSs, while the long tail might prefer free ones. So there can be interesting insights here.

Also, if I understand the spreadsheet right, "All" actually means the top 10M, not every site in the index, which is confusing.

Since 2022 we have increased our dataset to 16 Million or so sites. So the final category should be top 100m rather than top 10m used in 2022. So if query wasn’t changed to include 100m then it should be and should be rerun.

And yes you are correct that 100m should be labelled β€œall” rather than β€œ100m” (as we use < rank in our queries.

Last, the current analysis is focused on "headless" as an explanation in the jump whereas I think a more plausible explanation is the fact that our dataset nearly doubled in size from 2022 to 2024.

Not sure what this refers to but we typically report on % of pages exactly to avoid any issues with changes in dataset. That’s not to say that changes in popularity can’t also change percentages.

My recommendation is that unless others feel strongly about this section that we go ahead and cut it.

As I say I’ve not read it yet, so will leave others to decide on this but hopefully that helps with some context.

kevinfarrugia commented 2 weeks ago

The US and UK are missing from the chart, which I am guessing is simply because of the length of their names as they are both in the actual data sets. Is this something @kevinfarrugia could help us with?

@sirjonathan I updated the Google Sheet to include the United States and the United Kingdom of Great Britain and Northern Ireland. I am not sure why they were excluded, but the filter was specifically omitting both countries. The results are sorted by the total number of mobile sites using a CMS, descending.

kevinfarrugia commented 2 weeks ago

@sirjonathan I have updated the Google Sheet for top_cms_by_rank and cms_adoption_by_rank to also include sites where rank > 10,000,000.

TBH, I didn't fully understand the difference between the two queries but top_cms_by_rank seems to be the correct query and results.

kevinfarrugia commented 2 weeks ago

@sirjonathan Following up on the above, I have removed cms_adoption_by_rank as the query had a bug and once fixed, it returns the same data as top_cms_by_rank. Let me know if you were looking for something else with that query.

sirjonathan commented 1 week ago

Rank refers to popularity by Chrome page views in the previous month, rather than any SEO ranking.

Thank you for clarifying that @tunetheweb. That changes my perspective. Given that and @kevinfarrugia's work, I'll reverse my recommendation and work to include the section.

Thank you for your updates @kevinfarrugia. Per Barry's comment:

Since 2022 we have increased our dataset to 16 Million or so sites. So the final category should be top 100m rather than top 10m used in 2022. So if query wasn’t changed to include 100m then it should be and should be rerun.

When I looked at the results, the legend says 10,000,000. If I understand correctly, though, the query is including all sites up to and over 10M. If that's the case, could we update the legend to read 10,000,000+?

Also, when I look at top cms by rank it's currently listing XpressEngine, XOOPS, Wuilt, and Woltlab along with WordPress, which appears to be alphabetical. What I would be expecting is probably WordPress, Wix, Joomla, Drupal, and Squarespace.

tunetheweb commented 1 week ago

When I looked at the results, the legend says 10,000,000. If I understand correctly, though, the query is including all sites up to and over 10M. If that's the case, could we update the legend to read 10,000,000+?

I don't think that's correct. But there is a 100M category (also called "ALL") that shows that now that @kevinfarrugia added it.

Also, when I look at top cms by rank it's currently listing XpressEngine, XOOPS, Wuilt, and Woltlab along with WordPress, which appears to be alphabetical. What I would be expecting is probably WordPress, Wix, Joomla, Drupal, and Squarespace.

I've changed the sort order for you and it looks correct now.

kevinfarrugia commented 1 week ago

@sirjonathan Following up on the above, I have removed cms_adoption_by_rank as the query had a bug and once fixed, it returns the same data as top_cms_by_rank. Let me know if you were looking for something else with that query.

@sirjonathan I'm not sure if you saw my comment above and if you're referring to this sheet. I didn't delete the results for cms_adoption_by_rank for posterity, but the results are incorrect and should not be used. Use top_cms_by_rank. This includes ALL and Barry kindly fixed the sort order. LMK.