Closed rviscomi closed 5 years ago
I made a histogram of the frequencies of each classList length:
#standardSQL
SELECT
client,
classes,
COUNT(0) AS freq,
SUM(COUNT(0)) OVER (PARTITION BY client) AS total,
ROUND(COUNT(0) * 100 / SUM(COUNT(0)) OVER (PARTITION BY client), 2) AS pct
FROM (
SELECT
client,
ARRAY_LENGTH(REGEXP_EXTRACT_ALL(value, '([^\\s]+)(?:\\s+|$)')) AS classes
FROM
`httparchive.almanac.summary_response_bodies`,
UNNEST(REGEXP_EXTRACT_ALL(body, '(?i)class=[\'"]([^\'"]+)')) AS value
WHERE
firstHtml)
GROUP BY
client,
classes
ORDER BY
freq / total DESC
classes | desktop | mobile |
---|---|---|
0 | 0.11% | 0.11% |
1 | 64.27% | 63.39% |
2 | 20.18% | 20.52% |
3 | 7.44% | 7.71% |
4 | 4.31% | 4.49% |
5 | 1.71% | 1.75% |
6 | 0.74% | 0.74% |
7 | 0.38% | 0.39% |
8 | 0.22% | 0.22% |
9 | 0.19% | 0.19% |
10 | 0.10% | 0.10% |
11 | 0.06% | 0.07% |
12 | 0.05% | 0.05% |
13 | 0.04% | 0.04% |
14 | 0.03% | 0.03% |
15 | 0.02% | 0.03% |
Methodology note: This is an analysis of the static home page markup, not accounting for classes added dynamically in JS.
According to the results, 1 or 2 class names make up 80+% of all attribute values, so a p90 of 3 makes sense.
@argyleink do you think OOCSS libraries are prolific enough to skew the distribution? In 02_10 we see very few Tailwind pages, which is probably full of false negatives from Wappalyzer, but other than Bootstrap and animate.css websites don't seem to be using CSS libraries that much.
Also, I think this theme of having our assumptions gut-checked by the data is a perfect thing to talk about in the Almanac chapters. @argyleink @una you can talk about how and why the results surprised you and what that says about the state of CSS. cc @HTTPArchive/authors
interesting.. i brought up OOCSS libs because they're a primo example of AMPLE usage of classes, like enough that the high end could/should be quite a few classes per node. I could be convinced that OOCSS libs like tachyons, tailwind, etc arent popular enough to influence the data, but even other seamingly very very popular libraries like bootstrap or strategies like BEM have more than 2 classes on them almost always.
so sure, we could use this as a talking point, because it's counter to our assumptions. but something still feels off, like, # of classes on an element shouldnt be the most surprising result from scrubbing the entire web and comparing it to our assumptions. yet it is right now. 🤷♂
The total percent of all class name lengths greater than 10 is only 0.5%. We're talking about a small percent of a sample of ~1.6B class attributes though, so that's still ~8M instances of having 10+ classes. That perspective might make this more digestible.
Fun fact: the most class names is 21,504! It only occurs once and I assume that's a parsing bug 😁
Here's the sheet with the full results if you want to explore.
I'll continue looking into this. For example, maybe a different strategy of counting BEM-style classes would be helpful. Also let me know if you think there's another approach that might help. One other thing we could do for the upcoming October crawl is add a custom metric to count classList lengths so we're actually querying the DOM rather than parsing HTML with regexes, for better confidence.
great ideas. that sheet is fascinating!
I think this data is super fascinating! I don't think oocss libraries are as widely used on the web as they seem from some circles. Also I wonder if this speaks to the majority of the web not using reusable/global styles as frequently. The data on 0 classes being used surprised me, as I assume many people are still styling based on base element. How would this account for nested elements like '.list li'?
On Fri, Sep 6, 2019, 8:03 PM Adam Argyle notifications@github.com wrote:
great ideas. that sheet is fascinating!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HTTPArchive/almanac.httparchive.org/issues/139?email_source=notifications&email_token=AAM5L3FVL4OJFVUUL3RZIRDQIKLNNA5CNFSM4ITZ4VIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD6DTQFQ#issuecomment-528955414, or mute the thread https://github.com/notifications/unsubscribe-auth/AAM5L3FQVWRPSFHUA7YZZGTQIKLNNANCNFSM4ITZ4VIA .
The data on 0 classes being used surprised me, as I assume many people are still styling based on base element. How would this account for nested elements like '.list li'?
Do you mean in the CSS? This query doesn't take the selectors into account, only the class
attributes in the HTML. So the 0 values in this case are people with empty attributes, eg class=""
.
This data looks pretty accurate to me.
At first I took a double take much like you all are describing. And it took me a bit to remember the majority of sites aren't "techy" like Medium, but instead use basic Wordpress themes or have a rudimentary static site like Hacker News.
After I started looking at sites like these instead, the data began making sense when I saw tons of classes like: email
, button
, container
, four columns
, footer
, latest_post_image clearfix
, woocommerce single-product
From metric 02_45 in the CSS analysis results sheet:
@argyleink left this comment:
The query is at https://github.com/HTTPArchive/almanac.httparchive.org/blob/master/sql/2019/02_CSS/02_45.sql
Need to investigate whether the results are accurate.