Closed nrllh closed 5 days ago
I'm happy that the project is back again! I'd love to return and contribute as either author or co-author of this year's chapter.
Hey @alexdenning @dknauss @alonkochba @honzasladek @csliva @dknauss - awesome contributors from previous years π Are you interested in joining us again this year?
I'm interested in reviewing. I'm an experienced writer and editor and have previously worked with @sirjonathan. I'll be away most of September but will be back on the 27th, so it would be good to co-review with someone else.
I am also interested in reviewing. I have experience within the area and can be a subject matter reviewer along with having worked with @sirjonathan to aid collaboration.
Hi @sirjonathan, Thank you very much for volunteering to lead the writing of this chapter! Could you please organize a kick-off meeting for this chapter (example: https://github.com/HTTPArchive/almanac.httparchive.org/issues/3603#issuecomment-2064351177) to organize the writing of the chapter?
Furthermore, it would be helpful if you and all other contributors (@LoraRaykova, @niko-kaleev, @raewrites @karmatosed ) could join the slack channel of the HTTPArchive (https://join.slack.com/t/httparchive/shared_invite/zt-2hfkn28ts-~uXN4UGS0mXsKpzzhtZcow)
Thanks!
@turban1988 I've reached out to the team and am planning to hold the kickoff meeting next week.
@LoraRaykova, @niko-kaleev, and I met today. We discussed the previous years efforts and ideas for improving / expanding this year's chapter, including references to the Speculation Rules API and tracking themes within the WordPress section.
Our plan is to start by pulling over the 2022 outline and expand it with our ideas for this year. @niko-kaleev will take the first pass at that and we'll work on it together async.
We're meeting again on the 28th to finalize the outline after which I'll follow-up on any analyst related tasks.
/cc @turban1988
As discussed with @sirjonathan and @LoraRaykova on the kick-off meeting, the chapter outline is ready:
https://docs.google.com/document/d/13CxAp7HCcxHHCSuEnXS2rolKskLSlUvLQuqUD6QADYc/edit
@sirjonathan will review it next week, and we'll finalize it on the 28th.
/cc @turban1988
@Niko Kaleev @Lora Raykova and I met on Tuesday for a sync. The outline is in good shape and they're going to start on the parts of the chapter they can while we wait for data.
@nrllh has generously agreed to tackle the analyst work and replicate the analysis from the 2022 edition.
@niko-kaleev and I met today for a quick check-in. We discussed next steps and scheduled a follow-up for August 20, once we have results validated.
@sirjonathan Do you still need a hand? I have time to help if you're still working through the reviewing and editing.
You can find the first draft at the bottom of the document: https://docs.google.com/document/d/13CxAp7HCcxHHCSuEnXS2rolKskLSlUvLQuqUD6QADYc/edit
What still needs to be done:
Keep in mind that a lot of the data presented in 2022 was missing from this year's data export. All the missing data points are highlighted in red or in a comment.
@sirjonathan, would you mind extracting the missing data points and writing a paragraph or two for each? We believe that will make the whole chapter even more valuable to our readers.
@niko-kaleev Thank you for the update! I'm planning to set aside some time during WordCamp US next week to work on 1 and 2. Regarding the missing data points, I'll take another look and see what's possible - I just may need technical help.
First off, a huge thanks to @niko-kaleev for all the heavy lifting that got us here. Here's a quick summary of what I've completed over the past few days:
There are 3 sections that I'd like input on before we finalize:
CMS Adoption share (Google Doc link) - This was something new we tried this year. On closer inspection at the data, I realized that it's a combination of desktop and mobile that while it could be interesting, according to our methodology there's a lot of overlap in the dataset as "most websites are included in both the mobile and desktop subsets." . It feels like it could be misleading without clarity and upon clarifying it doesn't feel that useful.
My recommendation is that we cut this section.
CMS Adoption by geography (Google Doc link) - This section feels a bit messy, but with some cleanup could be interesting and useful o include.
A few items I noticed:
My recommendation is that if we can get the US and UK added back to the results that we do so and clean up the section as outlined above.
CMS Adoption by rank (Google Doc Link) - I question how useful this section is as a whole. "rank" is subjective and, presumably is a reference to Google's index, but isn't introduced as such. Also, if I understand the spreadsheet right, "All" actually means the top 10M, not every site in the index, which is confusing. Last, the current analysis is focused on "headless" as an explanation in the jump whereas I think a more plausible explanation is the fact that our dataset nearly doubled in size from 2022 to 2024.
My recommendation is that unless others feel strongly about this section that we go ahead and cut it.
I'd love feedback on the above items over the next few days. From there, once @raewrites and @dknauss have given their all clear on reviewing, we can move forward to publishing.
@nrllh can you or someone else tackle the markdown generation and prepare the PR?
Hi Jonathan,
Thanks for the summary and update! I just wanted to quickly note that both Niko and I worked on this chapter equally as half of the draft was written and analyzed by me.
Hi Lora, of course! My mistake. Thank you for clarifying and for all your work on it.
"by geography" seems confusing, I suggest we update it to "by country"
βCountryβ causes some people some concerns based on certain geo-political areas (e.g. Taiwan). Hence we say by βgeographical reasonβ.
CMS Adoption by rank (Google Doc Link) - I question how useful this section is as a whole. "rank" is subjective and, presumably is a reference to Google's index, but isn't introduced as such.
Rank refers to popularity by Chrome page views in the previous month, rather than any SEO ranking.
IMHO (and without having read the draft), this is useful to see the split. E.g. top ranked sites for more commercial sites typically can afford paid-for CMSs, while the long tail might prefer free ones. So there can be interesting insights here.
Also, if I understand the spreadsheet right, "All" actually means the top 10M, not every site in the index, which is confusing.
Since 2022 we have increased our dataset to 16 Million or so sites. So the final category should be top 100m rather than top 10m used in 2022. So if query wasnβt changed to include 100m then it should be and should be rerun.
And yes you are correct that 100m should be labelled βallβ rather than β100mβ (as we use < rank in our queries.
Last, the current analysis is focused on "headless" as an explanation in the jump whereas I think a more plausible explanation is the fact that our dataset nearly doubled in size from 2022 to 2024.
Not sure what this refers to but we typically report on % of pages exactly to avoid any issues with changes in dataset. Thatβs not to say that changes in popularity canβt also change percentages.
My recommendation is that unless others feel strongly about this section that we go ahead and cut it.
As I say Iβve not read it yet, so will leave others to decide on this but hopefully that helps with some context.
The US and UK are missing from the chart, which I am guessing is simply because of the length of their names as they are both in the actual data sets. Is this something @kevinfarrugia could help us with?
@sirjonathan I updated the Google Sheet to include the United States and the United Kingdom of Great Britain and Northern Ireland. I am not sure why they were excluded, but the filter was specifically omitting both countries. The results are sorted by the total number of mobile sites using a CMS, descending.
@sirjonathan I have updated the Google Sheet for top_cms_by_rank and cms_adoption_by_rank to also include sites where rank > 10,000,000.
TBH, I didn't fully understand the difference between the two queries but top_cms_by_rank seems to be the correct query and results.
@sirjonathan Following up on the above, I have removed cms_adoption_by_rank
as the query had a bug and once fixed, it returns the same data as top_cms_by_rank
. Let me know if you were looking for something else with that query.
Rank refers to popularity by Chrome page views in the previous month, rather than any SEO ranking.
Thank you for clarifying that @tunetheweb. That changes my perspective. Given that and @kevinfarrugia's work, I'll reverse my recommendation and work to include the section.
Thank you for your updates @kevinfarrugia. Per Barry's comment:
Since 2022 we have increased our dataset to 16 Million or so sites. So the final category should be top 100m rather than top 10m used in 2022. So if query wasnβt changed to include 100m then it should be and should be rerun.
When I looked at the results, the legend says 10,000,000. If I understand correctly, though, the query is including all sites up to and over 10M. If that's the case, could we update the legend to read 10,000,000+
?
Also, when I look at top cms by rank
it's currently listing XpressEngine, XOOPS, Wuilt, and Woltlab along with WordPress, which appears to be alphabetical. What I would be expecting is probably WordPress, Wix, Joomla, Drupal, and Squarespace.
When I looked at the results, the legend says 10,000,000. If I understand correctly, though, the query is including all sites up to and over 10M. If that's the case, could we update the legend to read 10,000,000+?
I don't think that's correct. But there is a 100M category (also called "ALL") that shows that now that @kevinfarrugia added it.
Also, when I look at top cms by rank it's currently listing XpressEngine, XOOPS, Wuilt, and Woltlab along with WordPress, which appears to be alphabetical. What I would be expecting is probably WordPress, Wix, Joomla, Drupal, and Squarespace.
I've changed the sort order for you and it looks correct now.
@sirjonathan Following up on the above, I have removed cms_adoption_by_rank as the query had a bug and once fixed, it returns the same data as top_cms_by_rank. Let me know if you were looking for something else with that query.
@sirjonathan I'm not sure if you saw my comment above and if you're referring to this sheet. I didn't delete the results for cms_adoption_by_rank
for posterity, but the results are incorrect and should not be used. Use top_cms_by_rank
. This includes ALL and Barry kindly fixed the sort order. LMK.
CMS 2024
If you're interested in contributing to the CMS chapter of the 2024 Web Almanac, please reply to this issue and indicate which role or roles best fit your interest and availability: author, reviewer, analyst, and/or editor. You might be interested in exploring the changes to this year's version here.
Content team
Expand for more information about each role π
- The **[content team lead](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Content-Team-Leads'-Guide)** is the chapter owner and responsible for setting the scope of the chapter and managing contributors' day-to-day progress. - **[Authors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Authors'-Guide)** are subject matter experts and lead the content direction for each chapter. Chapters typically have one or two authors. Authors are responsible for planning the outline of the chapter, analyzing stats and trends, and writing the annual report. - **[Reviewers](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Reviewers'-Guide)** are also subject matter experts and assist authors with technical reviews during the planning, analyzing, and writing phases. - **[Analysts](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Analysts'-Guide)** are responsible for researching the stats and trends used throughout the Almanac. Analysts work closely with authors and reviewers during the planning phase to give direction on the types of stats that are possible from the dataset, and during the analyzing/writing phases to ensure that the stats are used correctly. - **[Editors](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Editors'-Guide)** are technical writers who have a penchant for both technical and non-technical content correctness. Editors have a mastery of the English language and work closely with authors to help wordsmith content and ensure that everything fits together as a cohesive unit. - The **[section coordinator](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Section-Leads'-Guide)** is the overall owner for all chapters within a section like "User Experience" or "Page Content" and helps to keep each chapter on schedule. _Note: The time commitment for each role varies by the chapter's scope and complexity as well as the number of contributors._ For an overview of how the roles work together at each phase of the project, see the [Chapter Lifecycle](https://github.com/HTTPArchive/almanac.httparchive.org/wiki/Chapter-Lifecycle) doc.Milestone checklist
0. Form the content team
April 15
Complete program and content committee - π Organizing committee1. Plan content
May 1
First meeting to outline the chapter contents - π Content team2. Gather data
June 1
Custom metrics completed - π AnalystsJune 1
HTTP Archive Crawl - π HA Team3. Validate results
August 15
Query Metrics & Save Results - π Analysts4. Draft content
September 15
First Draft of Chapter - π AuthorsOctober 10
Review & Edit Chapter - π Reviewers & Editors5. Publication
October 15
Chapter Publication (Markdown & PR) - π AuthorsNovember 1
Launch of 2024 Web Almanac π - π Organizing committee6. Virtual conference
November 20
Virtual Conference - π Content TeamChapter resources
Refer to these 2024 CMS resources throughout the content creation process: π Google Docs for outlining and drafting content π SQL files for committing the queries used during analysis π Google Sheets for saving the results of queries π Markdown file for publishing content and managing public metadata π» Collab notebook for collaborative coding in Python - if needed π¬ #web-almanac-cms on Slack for team coordination