LeaVerou / css-almanac

Repo for planning & voting on which stats to study
https://projects.verou.me/mavoice/?repo=leaverou/css-almanac&labels=proposed%20stat
34 stars 3 forks source link

Declaration repetition (DRYness) #54

Open j9t opened 3 years ago

j9t commented 3 years ago

Via https://github.com/HTTPArchive/almanac.httparchive.org/issues/898#issuecomment-734337304:

If still feasible it would be interesting to know where we stand with respect to declaration repetition, that is, the ratio of unique to total declarations per style sheet (ideally: per media query).

The resulting factor would tell us how “DRY” and therefore maintainable style sheets are, and could potentially inform relevant spec decisions.

Past data, context, and some thoughts are available in 70% Repetition in Style Sheets (disclosure: own article).

LeaVerou commented 3 years ago

How would you measure that? What is a unique declaration? What normalization would you do? E.g. what about different property capitalization, 0 vs 0px, different ways to specify the same color etc?

j9t commented 3 years ago

Basic method: Take a style sheet, count the number of declarations, and count how many different declarations there are (just comparing the strings constructed from going first to last character); then divide the number of different (unique) declarations by the overall number of declarations.

Advanced method: Count per media query, and normalize declarations (so that border: none, border: 0, border:0, border: 0px &c. all count as the same declaration—this can be tough, but might be simplified greatly by using some auto-fixing linter like stylelint).

What I have liked to do so far is take a shortcut using CSS Stats: Although it would be useful to check on the definitions they use (personally, I’ve in a way shifted the problems you raise to them), they provide this data in their “Total vs Unique Declarations” section—see for lea.verou.me.

LeaVerou commented 3 years ago

I'd rather not introduce new dependencies at this stage, but some basic normalization like lowercasing properties is possible. What kind of results do you envision? If the JS returns a percentage of unique declarations, and then that is aggregated over the corpus to give us percentiles of that number, is that good?

LeaVerou commented 3 years ago

I just pushed the JS for this

You can test it out here: http://projects.verou.me/rework-utils/?url=https%3A%2F%2Flea.verou.me%2F (go to Query and select "Declaration repetition #54" )

It seems to produce nearly identical results to cssstats.

j9t commented 3 years ago

That (the whole tool) is pretty awesome! (Will you keep it up? I like the idea of featuring it on UITest.com.)

For the Almanac, what kind of analysis would still be possible? I guess the less granular, the easier, meaning that it would be simpler to apply this to the whole data set than to break it down to style sheets or even media queries?

As you already have so much material for the chapter, maybe keeping it simple is a good approach so that this could be touched on in a brief paragraph? (I’m not sure of how much help I can be with the technical part of the analysis, but I’m down to helping with interpretation if need be.)

LeaVerou commented 3 years ago

@rviscomi might be best to reply here as he's written most of the SQL. But I think the most reasonable (and easy) thing to do would be to get percentiles (0, 10, 25, 50, 75, 90, 100) of the ratio, or perhaps of the number of declarations too (unique and total).

LeaVerou commented 3 years ago

That (the whole tool) is pretty awesome! (Will you keep it up? I like the idea of featuring it on UITest.com.)

Thanks! I do want to do something with it, I'm just not sure what yet :)

rviscomi commented 3 years ago

I know it's tempting, but given the timeline, I think we should lock the analysis down and be in maintenance mode for the existing queries.