Code for the daily inference pipeline that extracts named entities from those GOV.UK pages that have been substantially updated or were newly created the day before ("yesterday's new content").
Notes to the reviewers
Please start by reading the README.md file which provides a thorough overview of the pipeline workflow and tech stack involved.
Summary
Code for the daily inference pipeline that extracts named entities from those GOV.UK pages that have been substantially updated or were newly created the day before ("yesterday's new content").
Notes to the reviewers
Please start by reading the README.md file which provides a thorough overview of the pipeline workflow and tech stack involved.
The main things I'd like you to review are:
the overall logic of the pipeline, as summarised in the README and implemented in the two Cloud Workflow files daily_batchjob_workflow.yml and update_entity_buildup.yml
whether the documentation is accurate and complete enough.
Checklists
This pull/merge request meets the following requirements:
docs/aqa/aqa_plan.md
)docs/aqa/data_log.md
), if necessarydocs/aqa/assumptions_caveats.md
), if necessarydocs
folderComments have been added below around the incomplete checks.