[meta] Value utilization

gregwhitworth commented 3 years ago

While it may be obvious, I've historically found value utilization valuable in a normalized manner for browser vendors, interested web devs as well as standards folks. What I mean by normalization is to remove down to the unit. Francois and myself did a lot of work here because many people report on say the utilization of a property (eg: Chromestatus) but we found a lot of value in the value. So we would normalize them to parans, units, keywords. So for color you would see something like this:

system color (you may want to break this down but initially just this keyword is of value)
hex
rgba()
rgb()
hsl()
custom property

The units become valuable, say for example percentages used in the margin and padding properties :)

LeaVerou commented 3 years ago

Hi Greg 👋 ,

I'm afraid I'm not sure what the proposal is, to normalize values or to not normalize? To study values as well, and not just properties? If so, we already do plan to study values, and have a bunch of proposals that depend on values (partially or entirely), e.g. #1 #4 #8 #11 #14 #15 #16 #18 #20 #28 #32 #34 #35 Are there any specific values you think we should study that haven't been covered by other issues, and if so, which ones?

gregwhitworth commented 3 years ago

It's to normalize values but ALL of them generically following the syntax spec (more or less). While looking through some of the issues there are one off investigations into specifics about a specific value type; which is fine but it would be good to have a general rundown as well as often you don't know the specific question to ask up front but they do arise (eg: are percentages used in margin-left on top websites?). You then can have custom ones that you want to pivot on for certain props but it gives you a good starting point for all props rather than the specifics of only a few. The site is no longer up but this is similar to what we shipped for CSS Usage, you can see it briefly here at minute 10 detailing how we determined which -webkit-appearance values to implement: https://noti.st/gregwhitworth/videos/s4O7wk

Once we had that then other questions came up, such as - is this property used inside of a keyframe? Which you can then tie together as that will build off of the above.

The source for that is here: https://raw.githubusercontent.com/MicrosoftEdge/css-usage/master/cssUsage.src.js

You can take that, paste it into your console and upon completion look at window.CSSUsageResults JSON object to understand how CSS was used on that page. For example, here is your home page: https://gist.github.com/gregwhitworth/6cf0db21b0d63468a40192b5be9fc742

If you then aggregate that across http-archive you're able to have a solid understanding of what is being used without actually having a specific question to ask yet but may be surprised to see (eg: -moz such and such is growing over time, maybe we should consider standardizing that, etc). Some things that are in that script but weren't surfaced on the site (due to not having time) are at-rules and normalized selectors. The next item we were going to tackle was HTML attributes and combining those with styles (which was a common investigation for a11y).

Hope that helps :/

LeaVerou commented 3 years ago

Hi Greg,

Wow, this is amazing!!

We currently parse the CSS with a modified version of Rework CSS, you can see the kind of AST it generates here @rviscomi Can we also store the results of Greg's script in a css_values column, like parsed_css? This will simplify A LOT of metrics.

rviscomi commented 3 years ago

@rviscomi Can we also store the results of Greg's script in a css_values column, like parsed_css? This will simplify A LOT of metrics.

Yes that'd be possible. I'm still getting familiar with the output, but how different is it to Rework? For example, when would we use one or the other?

LeaVerou commented 3 years ago

@gregwhitworth

After looking at your script I have a few questions about the code.

How does it figure out the values? I see a search for conic-gradient() reveals nothing, even though I have used it in that stylesheet, yet it doesn't look like the code does matching for specific functions.

@rviscomi

It actually parses values, whereas Rework just gives you the entire value as an unparsed string.
It is not a parser, it computes actual statistics about what is used in the CSS, so all we need to do for the queries later is to aggregate these statistics, whereas with Rework, we'd need to calculate the statistics with JS in the query and then do the aggregation.

Both are useful, it's not an either or.

LeaVerou commented 3 years ago

@rviscomi We may need to adapt the script, what's the deadline for custom code?

LeaVerou commented 3 years ago

@gregwhitworth When was this code written? How does it handle custom properties?

rviscomi commented 3 years ago

Since the web crawler doesn't depend on the scripts themselves (the scripts run on the output of the crawler) you have all of August while the crawl is running to edit the scripts.

gregwhitworth commented 3 years ago

When was this code written? How does it handle custom properties?

Oh goodness, I started writing it years ago and then @FremyCompany came in and gave it some well needed re-factoring . I started writing it roughly 5 years ago I supposed. And yeah, some of the more complex prop/value types have custom methods so conic gradient probably got eaten up by that. There is no perfect solution to this unfortunately. This is why unit testing was added because we'd fix one part of the parser and then break another.

Custom prop support was added but I believe it will only capture the var() utilization for the property that it is attached to. Custom property authoring was itself was not given any special treatment.

LeaVerou commented 3 years ago

@gregwhitworth Since we already do parsing via Rework, I wonder if it would be better for me (or another analyst) to take the spirit of this script and rewrite it to work with the existing AST that we have. This way it also doesn't depend on the CSSOM, which would require custom metrics (which means any customization would be due in less than a week).

So, in terms of data, from what I've gathered this is what it measures:

Number of rules by type
Number of usages by property
Value frequency by property (clustered by unit, keyword, function)

Did I miss something?

I see there are also metrics like number of elements, ids, classes etc, but those are already measured by the chapters they pertain to.

FremyCompany commented 3 years ago

Hi Lea,

Yes, you would likely benefit from rewriting. The script we used was not using an AST and used Regular Expressions instead, which enabled us to make it dependency-free and quickly iterate, but in hindsight using an AST would have made it more robust.

FremyCompany commented 3 years ago

One thing we collected in addition which was useful in a few cases to evaluate the impact of bugs was which property/values were set in selectors containing particular pseudo-classes or pseudo-elements. For instance, it was nice to be able to know whether people used to set "box-shadow" on ":hover" or things like that.

LeaVerou / css-almanac

[meta] Value utilization #37