alphagov / govuk-knowledge-graph-gcp

GOV.UK content data and cloud infrastructure for the GovSearch app.
https://docs.data-community.publishing.service.gov.uk/tools/govgraph/
MIT License
8 stars 1 forks source link

fix: page-part handling in the graph.page table #677

Closed nacnudus closed 1 month ago

nacnudus commented 2 months ago

This legacy table is used by the GOV.UK Chat app. It was reimplemented by https://github.com/alphagov/govuk-knowledge-graph-gcp/pull/652, but now behaves differently from the original table regarding parts of multi-part pages. Users have since requested the following behaviour:

A real example is the multi-part page /universal-credit.

nacnudus commented 1 month ago

This is no longer required.

nacnudus commented 1 month ago

In case the branch is deleted, the proposed changes were to terraform-dev/bigquery/graph-page.sql

    "https://www.gov.uk" || COALESCE(content.base_path, editions.base_path) AS url
  FROM editions
  LEFT JOIN public.content ON content.edition_id = editions.id
  WHERE TRUE
  -- Omit the main page of multi-part documents, in favour of its duplicate that
  -- has a slug. For example, include the following:
  --
  --   /main-page/first-part
  --   /main-page/second-part
  --
  -- Omit the following:
  --
  --   /main-page
  AND (
    content.is_part IS NULL   -- Include documents that aren't multipart
    OR content.is_part        -- Omit the main page that doesn't have a slug
 )
)
SELECT
  pages.url,
  editions.public_updated_at,
  withdrawals.withdrawn_at,
  withdrawals.withdrawn_explanation,
  -- content.title is "title: part title" if it is a part of a document, but it
  -- doesn't include every schema_name, so fall back to editions.title.
  COALESCE(content.title, editions.title) AS title,
  JSON_value(editions.details, "$.internal_name") AS internal_name,
  editions.description,
  JSON_value(editions.details, "$.department_analytics_profile") AS department_analytics_profile,