Show number of samples per endpoint

donbowman commented 1 year ago

Describe the solution you'd like

I have an issue where some pages on my site that are not super popular are visited by some bot once in a while that has a very slow client side. this gives e.g. FCP in the thousands of seconds. And, it confuses me looking at the results page.

I think having a column on the reports page showing the # of samples that contribute to the number would help. And a filter that would allow hiding rows with less than some number of samples.

It is not clear to me what e.g. the FCP column refers to. Is it the average of the samples? the 75%ile? the median?

I would like to not see results that don't have some confidence to them.

I think also some method to remove or ignore outliers would be useful.

Example:

MySQL [www_database]> select timestamp,country,class,FCP_SUM from wp_vibes_statistics where endpoint = '/angular-content-security-policy-google-tagmanager/';
+------------+---------+---------+---------+
| timestamp  | country | class   | FCP_SUM |
+------------+---------+---------+---------+
| 2023-08-06 | NZ      | desktop |       0 |
| 2023-08-07 | US      | desktop |    6560 |
| 2023-08-07 | UA      | desktop |       0 |
| 2023-08-07 | DE      | desktop |    2878 |
| 2023-08-07 | FR      | desktop |       0 |
| 2023-08-07 | FI      | desktop |     825 |
| 2023-08-07 | DE      | mobile  |    2764 |
| 2023-08-07 | EE      | desktop |     983 |
| 2023-08-07 | IN      | mobile  | 9251220 |
| 2023-08-07 | PE      | desktop |    1471 |
+------------+---------+---------+---------+

Pierre-Lannoy commented 1 year ago

AH! So you're querying values right in the database 🤩 So to be clear, for all measurements, you have 4 columns: *_sum (which is … the sum), then *_good, *_impr and *_poor which are number of samples in the Good, Needs Improvements and Poor (Google) classification. If you want the total number of samples, just add *_good, *_impr and *_poor values.

donbowman commented 1 year ago

thanks for the explanation.

the underlying issue, i have a page which has not a lot of traffic. Some bot opens it, and low and slow reads it for 9000s (yes i have this datapoint!). this completely skews my results since its averaged in against a small number of 0.9s page loads.

so i'm suggesting a couple of features:

remove outliers
a 'quality' of score column (e.g. the 'N') so i can focus on ones that are behaving poorly and have enough data points to support this

ideally I would be able to sort by popular pages with low score.

another option would be a metric like holtz-winters prediction, where no one data point outweighs the others dramatically

Pierre-Lannoy / wp-vibes

Show number of samples per endpoint #2