fix: page-part handling in the graph.page table

nacnudus commented 2 months ago

This legacy table is used by the GOV.UK Chat app. It was reimplemented by https://github.com/alphagov/govuk-knowledge-graph-gcp/pull/652, but now behaves differently from the original table regarding parts of multi-part pages. Users have since requested the following behaviour:

Omit the main page of multi-part documents, in favour of its duplicate that has a slug. For example, include /main-page/first-part and /main-page/second-part, and omit /main-page.
Define the title of page parts as "title: part title", where "title" is the title of the main page, and "part title" is the title of the individual part.

A real example is the multi-part page /universal-credit.

Include /universal-credit/what-universal-credit-is (which is the first part).
Use "Universal Credit: What Universal Credit is" as the title of that part.
Omit /universal-credit (which is the URL of the main page of the set).

nacnudus commented 1 month ago

This is no longer required.

nacnudus commented 1 month ago

In case the branch is deleted, the proposed changes were to terraform-dev/bigquery/graph-page.sql

    "https://www.gov.uk" || COALESCE(content.base_path, editions.base_path) AS url
  FROM editions
  LEFT JOIN public.content ON content.edition_id = editions.id
  WHERE TRUE
  -- Omit the main page of multi-part documents, in favour of its duplicate that
  -- has a slug. For example, include the following:
  --
  --   /main-page/first-part
  --   /main-page/second-part
  --
  -- Omit the following:
  --
  --   /main-page
  AND (
    content.is_part IS NULL   -- Include documents that aren't multipart
    OR content.is_part        -- Omit the main page that doesn't have a slug
 )
)
SELECT
  pages.url,

  editions.public_updated_at,
  withdrawals.withdrawn_at,
  withdrawals.withdrawn_explanation,
  -- content.title is "title: part title" if it is a part of a document, but it
  -- doesn't include every schema_name, so fall back to editions.title.
  COALESCE(content.title, editions.title) AS title,
  JSON_value(editions.details, "$.internal_name") AS internal_name,
  editions.description,
  JSON_value(editions.details, "$.department_analytics_profile") AS department_analytics_profile,

alphagov / govuk-knowledge-graph-gcp

fix: page-part handling in the graph.page table #677