UB-Mannheim / AustrianNewspapers

NewsEye / READ OCR training dataset from Austrian Newspapers (1864–1911)
15 stars 3 forks source link

Official Announcement: Release of the revised version according to the OCR-D Level 2 guidelines #39

Closed JKamlah closed 11 months ago

JKamlah commented 1 year ago

Hello everyone, we plan to publish the revised version according to the OCR-D Level 2 guidelines in this repository in the coming days. In the past months, the GroundTruth was corrected and upgraded by student assistants and project staff. The long s and the double oblique hyphen were updated consistently and any remaining transcription errors were corrected as far as possible. The polygons of the regions and text lines have been corrected, as well as the reading order. The regions were also tagged. Further information on the optimisations will be available at the time of publication.

The following steps are planned for the publication:

We hope you will enjoy the revised version.

stweil commented 1 year ago

cc @wollmers who contributed most commits to this repository

wollmers commented 1 year ago

That's very fine to have a standard.

stweil commented 1 year ago

Steps 1 and 2 are now done. Local clones of the repository can be updated using these commands:

git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a
git remote prune origin
JKamlah commented 1 year ago

We are happy to announce that the new revised version is now available. 🥳 We are currently still expanding the wiki pages and will provide additional information and statistics on the revised dataset.

stweil commented 11 months ago

@JKamlah, can we close this issue?

Of course there will be further improvements and transcription fixes (like the ones which I made recently), so more releases can be made in the future.

JKamlah commented 11 months ago

All tasks completed!