HTTPArchive / cwv-tech-report

Core Web Vitals Technology Report
https://cwvtech.report
28 stars 2 forks source link

Automate the update of the CWV Tech Report #26

Open rviscomi opened 2 years ago

rviscomi commented 2 years ago

Each month I manually run a query to append the latest data to the httparchive:core_web_vitals.technologies table.

The query have a few places that need to be updated to reflect the latest dataset ("2021-07-01" and "2021_07_01" below). The query also depends on two tables that are made available at different times: the HA technologies.YYYY_MM_DD_* tables are made available at the end of the month corresponding to the release date, while the CrUX materialzied.device_summary table isn't updated until the second Tuesday of the month following the release date. If that's confusing, for example the 2021-07-01 (July 2021) dataset was released today on August 10, the second Tuesday of August.

I can help to publish a Pub/Sub topic when the CrUX release is available on BigQuery. This project would need to subscribe to that topic and kick off a process to run a query like the one below for the given release and append the results to the CWV table.

CREATE TEMP FUNCTION IS_GOOD(good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
  good / (good + needs_improvement + poor) >= 0.75
);

CREATE TEMP FUNCTION IS_NON_ZERO(good FLOAT64, needs_improvement FLOAT64, poor FLOAT64) RETURNS BOOL AS (
  good + needs_improvement + poor > 0
);

SELECT
  date,
  ARRAY_TO_STRING(ARRAY_AGG(DISTINCT category IGNORE NULLS ORDER BY category), ', ') AS categories,
  app,
  client,
  COUNT(DISTINCT url) AS origins,
  COUNT(DISTINCT IF(good_fid, url, NULL)) AS origins_with_good_fid,
  COUNT(DISTINCT IF(good_cls, url, NULL)) AS origins_with_good_cls,
  COUNT(DISTINCT IF(good_lcp, url, NULL)) AS origins_with_good_lcp,
  COUNT(DISTINCT IF(any_fid, url, NULL)) AS origins_with_any_fid,
  COUNT(DISTINCT IF(any_cls, url, NULL)) AS origins_with_any_cls,
  COUNT(DISTINCT IF(any_lcp, url, NULL)) AS origins_with_any_lcp,
  COUNT(DISTINCT IF(good_cwv, url, NULL)) AS origins_with_good_cwv,
  COUNT(DISTINCT IF(any_lcp AND any_cls, url, NULL)) AS origins_eligible_for_cwv,
  SAFE_DIVIDE(COUNTIF(good_cwv), COUNTIF(any_lcp AND any_cls)) AS pct_eligible_origins_with_good_cwv
FROM (
  SELECT
    date,
    CONCAT(origin, '/') AS url,
    IF(device = 'desktop', 'desktop', 'mobile') AS client,
    IS_NON_ZERO(fast_fid, avg_fid, slow_fid) AS any_fid,
    IS_GOOD(fast_fid, avg_fid, slow_fid) AS good_fid,
    IS_NON_ZERO(small_cls, medium_cls, large_cls) AS any_cls,
    IS_GOOD(small_cls, medium_cls, large_cls) AS good_cls,
    IS_NON_ZERO(fast_lcp, avg_lcp, slow_lcp) AS any_lcp,
    IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AS good_lcp,
    (IS_GOOD(fast_fid, avg_fid, slow_fid) OR fast_fid IS NULL) AND
    IS_GOOD(small_cls, medium_cls, large_cls) AND
    IS_GOOD(fast_lcp, avg_lcp, slow_lcp) AS good_cwv
  FROM
    `chrome-ux-report.materialized.device_summary`
  WHERE
    date = '2021-07-01' AND
    device IN ('desktop', 'phone')
) JOIN (
  SELECT DISTINCT
    CAST('2021-07-01' AS DATE) AS date,
    category,
    app,
    _TABLE_SUFFIX AS client,
    url
  FROM
    `httparchive.technologies.2021_07_01_*`
  WHERE
    app IS NOT NULL AND
    app != ''
) USING (date, url, client)
GROUP BY
  date,
  app,
  client
tunetheweb commented 2 years ago

FYI latest SQL is actually here: https://github.com/HTTPArchive/cwv-tech-report/blob/main/sql/monthly.sql