alphagov / govuk-content-metadata

GovNER: an encoder-based language model (RoBERTa) fine-tuned to perform Named Entity Recognition (NER) on GOV.UK content
MIT License
4 stars 1 forks source link

New-content inference pipe #101

Closed exfalsoquodlibet closed 1 year ago

exfalsoquodlibet commented 1 year ago

Summary

Code for the daily inference pipeline that extracts named entities from those GOV.UK pages that have been substantially updated or were newly created the day before ("yesterday's new content").

Notes to the reviewers

Please start by reading the README.md file which provides a thorough overview of the pipeline workflow and tech stack involved.

The main things I'd like you to review are:

Checklists

This pull/merge request meets the following requirements:

Comments have been added below around the incomplete checks.