NationalLibraryOfNorway / DHLAB

DHLAB is a library of python modules for accessing text and pictures at the National Library of Norway.
https://nationallibraryofnorway.github.io/DHLAB/
MIT License
20 stars 5 forks source link

Dev/corpus #193

Closed tungland closed 10 months ago

tungland commented 10 months ago

Dhlab contains multiple versions of some texts. This is usually related to multiple OCR scans of the object over the years. This PR implements a method that by default checks for duplicate urns in the returned corpus and shows only the latest version of the text.