UB-Mannheim / Reichsanzeiger

Software and data related to "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger"
Apache License 2.0
4 stars 0 forks source link

How to find corresponding image from an hocr result? #1

Open zuphilip opened 7 years ago

zuphilip commented 7 years ago

I have found something interesting here https://digi.bib.uni-mannheim.de/periodika/reichsanzeiger/ocr/film/tesseract-4.0.0-alpha.20170703/012-9419/0580.hocr and would like to see the corresponding image. How can I find it?

stweil commented 7 years ago

Get the microfilm number 012-9419 and the image number 0580 from the URL and use it in the viewer URL:

The correct image link should be offered by the search interface in the future.

stweil commented 7 years ago

Maybe the hOCR can be modified on the server side on the fly when it is requested by a web client:

A program could look up metadata in the database (date, issue, page number) and add it to the HTML answer (title tag, time information). Then it could add an image link, maybe also links for other visualisations (like hocrjs). The same program could also do post OCR and fix known OCR errors. That process would preserve the original OCR results, deliver the best post OCR available and preserve disk space.