GoogleChrome / lighthouse

Automated auditing, performance metrics, and best practices for the web.
https://developer.chrome.com/docs/lighthouse/overview/
Apache License 2.0
28.23k stars 9.35k forks source link

Ability to control precision of scoring values #11570

Closed koraa closed 3 years ago

koraa commented 3 years ago

The overall scoring and audit code each use a function clampTo2Decimals to clamp scored values to two decimals.

Having the ability to change the number of digits (or even just deactivate the clamping altogether) would be useful and give statistically minded users the ability to perform more in depth analysis of the produced scores.

E.g. the use case I would find this useful because I am currently optimizing a test setup for reduced jitter; I can still measure jitter with the reduced precision, but due to the clamping I need many more samples to properly quantify the amount of jitter

I would be very open to creating a PR for this, provided there is interest in merging such a feature…

paulirish commented 3 years ago

Hi @koraa

super cool work on harmonicabsorber!! that's so rad.

a few things i wanted to add.

web performance measurement is tricky and there's so many sources of variance/jitter that it's nigh impossible to expect repeatedable results. (using webpagereplay would definitely be an ingredient in the best repro setup). We have some more docs on this topic at https://github.com/GoogleChrome/lighthouse/blob/master/docs/variability.md

As for clamping.. we clamp just to give a signal of our significant digits. there's enough variance that having higher precision just becomes misleading.

that said, we do not clamp the metric values.. only the scores. so for all those audit values, i would recommend looking at the numericValue which will have a bit higher precision. (Also fwiw the lhr.audits.metrics.details.items payload has even more numbers, though personally i'd stick to each metric audit's numericValue ;)

(While discussing this we also considered that perhaps it'd be useful to our scoring calculations.. eg a method to calculate the 0-1 score of a given LCP/whatever value. We currently don't have this quite available, but it's possible to explore..)

koraa commented 3 years ago

Hi @paulirish,

web performance measurement is tricky and there's so many sources of variance/jitter that it's nigh impossible to expect repeatedable results. (using webpagereplay would definitely be an ingredient in the best repro setup). We have some more docs on this topic at https://github.com/GoogleChrome/lighthouse/blob/master/docs/variability.md

Thanks for pointing that out! Yes I am aware of that, our goal at the moment is not to reduce variance and instead develop appropriate statistical models to characterize the distribution and produce an estimate of the score along with error bars…

However, I agree that performing estimation on measurements separately and then compute the final score from these estimations is better than characterizing score distributions. To that end, is it correct to assume that all scores use the same method of generating the score from the scores log-normal cdf or are there differences between the scores?

As for clamping.. we clamp just to give a signal of our significant digits. there's enough variance that having higher precision just becomes misleading.

(Not that it matters much; but don't you lose a bit of precision by clamping twice? Clamping subscores and then clamping the weighted average again? Not that I have checked anything here, but my feeling here is that this would introduce, some uncomfortable nonlinearities in cases where many sub scores are close to (n+0.5)%; would be hard to see though because of the high dimensional nature of the average. Might be better to calculate the average on scores before clamp and then clamp once).

patrickhulce commented 3 years ago

is it correct to assume that all scores use the same method of generating the score from the scores log-normal cdf or are there differences between the scores?

Yes, all performance metric scores use the log-normal CDF method with different control points for each metric-environment that are defined in the audit's options object.

https://github.com/GoogleChrome/lighthouse/blob/e9d72247ef89cf6bf54c6fe1271525e718aebea0/lighthouse-core/audits/metrics/first-contentful-paint.js#L39-L47

https://github.com/GoogleChrome/lighthouse/blob/e9d72247ef89cf6bf54c6fe1271525e718aebea0/lighthouse-core/audits/audit.js#L71-L83

but don't you lose a bit of precision by clamping twice? Clamping subscores and then clamping the weighted average again?

We do, but as you noted, it's hard to see and doesn't matter much compared to the other sources of noise in this data :) All statistical analysis attempts we're aware of use the underlying metric values for the reasons above.

Please let us know if there are any particular utilities Lighthouse could expose that would help your projects in this area! Super exciting to see this type of work being done independently and would love to share notes :)

connorjclark commented 3 years ago

in case it helps, we do all sorts of math-y things for the scoring calculator here: https://github.com/paulirish/lh-scorecalc/blob/master/script/math.js