alphagov / govuk-content-metadata

GovNER: an encoder-based language model (RoBERTa) fine-tuned to perform Named Entity Recognition (NER) on GOV.UK content
MIT License
4 stars 1 forks source link

Manage post-extraction processing with Google Workflow #70

Closed exfalsoquodlibet closed 1 year ago

exfalsoquodlibet commented 1 year ago

Summary

A Google Workflow now orchestrates the named entities post-extraction processing.

The scheduled workflow consists of three steps and can be found in the src/post_extraction_process/post-extraction-gc-workflow.yaml file.

The first two steps (i.e., creation of named_entities.named_entities_all and named_entities.named_entities_counts BigQuery tables) were first implemented by a scheduled queries in Big Query, which has now been deleted from the repository.

Documentation has been added or updated.

Checklists

This pull/merge request meets the following requirements:

Comments have been added below around the incomplete checks.