IBM / page-lab

PageLab enables web performance, accessibility, SEO, etc testing at scale.
Apache License 2.0
19 stars 10 forks source link

Adjust how average data is displayed for a single page #32

Open amfred opened 5 years ago

amfred commented 5 years ago

We need to be able to see how different sets of pages are performing over time. Currently, when Page Lab shows the average data for a specific web page, it's averaging all of the results it has collected since the time the page was registered with the system. The longer a page has been tracked, the harder it will be to see the effect of changes made to the page and get feedback on how they positively or negatively affected performance.

I recommend making these updates: When viewing the details for one page's performance data, use a 30-day trailing average for the performance score as the basic display.
Also show these additional calculated metrics:

ecumike commented 5 years ago

For scalability, we do have updates planned to trim both the chart and the data table to show a shorter default timeframe, like "latest 2 weeks", and provide a UI which will allow a user to increase the data set to maybe 30, 60, or 90 day back.

Possible thought is to simply add two calendar pickers for start and end date, default to the past two weeks, and just let the user choose the data set range for the chart and the data table. UI will need some thought but I'm leaning towards simply 2 calendar date pickers for the chart and data table.

Note: All "average" numbers are stored in the database for fast access, calculating them on-the-fly would massively increase the report detail page load time, especially since user-timing measures are infinitely increasable, by page. Some thought would need to happen, or caveat be made for a feature request to allow custom date ranges for averages of all the KPI and user-timing measures.

amfred commented 5 years ago

I'm not understanding why calculating the average of (14, 30, whatever) page speed scores for a single URL would take a long time. Isn't that a built-in SQL function? (pseudocode: SELECT AVG(score) FROM resultsTable WHERE ID="thisSpecificReportID" AND resultDate between "startDate" and "endDate")

Could we, as a first pass, have a single start and end date picker that applies to the entire page? I think the most common case would be that you would want to see the report details for the same dates as the averages.

amfred commented 5 years ago

Oh wait, I think I get it. You're calculating dozens of averages for each URL, and displaying all of those. Right now, are you calculating all of them at the end of a test run? How long does that take? So basically, for the page to make sense, all of the metrics would have to be re-calculated based on the date range.

I do agree that 2 weeks is a reasonable default.

rcalfredson commented 5 years ago

I have an idea for how it may be possible to prevent recalc slowdowns while still allowing for multiple time ranges. Perhaps there can be a new model added that defines a number of days back from present to calculate an average. This model would also keep track of the last time averages were calculated.

There would be a save method overriding the default, and this would begin with a guard clause to check whether the current date matches the date when last the averages were calculated. If yes, then that method would exit early, and if not, the averages would be recalculated.

The disadvantage of this approach is that it would only allow the user to select predefined time windows, but it would also reduce the likelihood of slowdowns resulting from an on-the-fly calculation. Do you think this makes sense? Thank you.

ecumike commented 5 years ago

For the chart and table this isn't an issue, for all the others it becomes an issue. The chart and data table is something I have on my todo list and is needed for performance (the table mostly). Making the whole page with range customizable is another animal and requires some research and thought.

daviddahl commented 5 years ago

I have an idea for how it may be possible to prevent recalc slowdowns while still allowing for multiple time ranges. Perhaps there can be a new model added that defines a number of days back from present to calculate an average. This model would also keep track of the last time averages were calculated.

I'm sure there is an accepted best practice for this kind of rolling average data storage / query. We'll have to do some research

amfred commented 5 years ago

I'm also wondering - have you researched why the database queries are slow yet? Per this comment above:

Note: All "average" numbers are stored in the database for fast access, calculating them on-the-fly would massively increase the report detail page load time, especially since user-timing measures are infinitely increasable, by page. Some thought would need to happen, or caveat be made for a feature request to allow custom date ranges for averages of all the KPI and user-timing measures. <

Would it be worthwhile to dig into the performance of the "calculate all averages" step to see if we can optimize that to the point where we can just run the queries live for the current (one-page) URL report, when the user requests a different time frame, rather than adding additional database caching to deal with slow performance?

I'm also thinking that statistically, we may only have to re-calculate averages for a few of the individual URL reports each week. Most people (non-engineers) will probably be OK with whatever the default time frame is, whether that's 30 days or 2 weeks.

I also want to clarify that I'm OK with only having this feature enabled after you drill down to a single-page URL report. I don't have a requirement, myself, to let a user change the default timeframe for the URL report comparison view (although I think that could have a better default as well).

amfred commented 5 years ago

I like what you've done already to show data for the last 15, 30, or 60 test runs. Thanks!