alphagov / govuk-content-metadata

GovNER: an encoder-based language model (RoBERTa) fine-tuned to perform Named Entity Recognition (NER) on GOV.UK content
MIT License
4 stars 1 forks source link

Update post-extraction processing workflow #92

Closed exfalsoquodlibet closed 1 year ago

exfalsoquodlibet commented 1 year ago

Added metadata info about extracted entities start and end character index and line number location to named_entities_all bigquery output table in src/post_extraction_process/post-extraction-gc-workflow.yaml

Checklists

This pull/merge request meets the following requirements:

Comments have been added below around the incomplete checks.