NationalLibraryOfNorway / meteor

A python module and REST API for automatic extraction of metadata from PDF files
Apache License 2.0
11 stars 2 forks source link

feat: Identify outdated authority posts and sort them last #18

Closed pierrebeauguitte closed 11 months ago

pierrebeauguitte commented 11 months ago

Some organization names are present twice in the Norwegian Authority File, with one post referring to the current name / organization, and the other referring to a previous instance. This information is written in free text, in different ways ("Gjelder fra YYYY", "Gjelder perioden YYYY-YYYY", "Gjelder til DD.MM.YYYY"...), in MARC field 678a or 680a. The text_outdated method looks for simple patterns in these fields. It does not catch all instances of outdated posts, but has given high enough accuracy in tests.