Once this was correct the next part was to migrate the Search Analytics GA4 code from the Ruby repo to be part search-api repo (rationale: it’s easier to run the task if part of search-api - can be done on a pod in K8s, whereas running search-analytics from GitHub actions was disastrous for catching failures, it’ll be easier to maintain as a single codebase and not a separate thing for teams to look at) this decision will also bring down costs as we no longer need to maintain another code base and we don't need to have an S3 bucket anymore to dump the data to and for search-api to pick up, it can all be done as part of the rake page_traffic:load task
Rollback Plan
If in the unlikely case this fails in production we have a way to roll back. This will be that we revert this PR, and then it will revert back to using the data in the S3 bucket. We will then have to rerun the cron jobs in Argo to reset the popularity back.
search-api-load-page-traffic which pulls the data from GA and ingests into Elastic Search
Ticket: https://trello.com/c/SHc04ixR/110-search-analytics-pipeline-ua-to-ga4
The final output is the 'popularity' fields: https://www.gov.uk/api/search.json?fields=popularity_b,popularity
As part of the migration to GA4, the task was to understand the existing Python application and its output https://github.com/alphagov/search-analytics, then to update the Ruby application https://github.com/alphagov/search-analytics-ga4/ in order to get an identical/very similar output to the Python app.
Once this was correct the next part was to migrate the Search Analytics GA4 code from the Ruby repo to be part search-api repo (rationale: it’s easier to run the task if part of search-api - can be done on a pod in K8s, whereas running search-analytics from GitHub actions was disastrous for catching failures, it’ll be easier to maintain as a single codebase and not a separate thing for teams to look at) this decision will also bring down costs as we no longer need to maintain another code base and we don't need to have an S3 bucket anymore to dump the data to and for search-api to pick up, it can all be done as part of the
rake page_traffic:load
taskRollback Plan
If in the unlikely case this fails in production we have a way to roll back. This will be that we revert this PR, and then it will revert back to using the data in the S3 bucket. We will then have to rerun the cron jobs in Argo to reset the popularity back.
search-api-load-page-traffic
which pulls the data from GA and ingests into Elastic Searchsearch-api-update-govuk-index-popularity
will use this new data from Elastic Search and update the search api popularity results here: https://www.gov.uk/api/search.json?fields=popularity_b,popularity