HeardLibrary / vandycite

0 stars 0 forks source link

OCR on "prints" #24

Closed baskaufs closed 10 months ago

baskaufs commented 2 years ago

This is a refinement of #3 and is also related to #23

The category of "prints" includes both art prints (with labels that are generally titles) and posters (with labels that are generally the text on the poster). We can use OCR and compare the results with the labels to confirm whether the print is a poster or not. Some preliminary work has been done on this using Keras OCR.

baskaufs commented 2 years ago

Useful video series on OCR: https://www.youtube.com/playlist?list=PL2VXyKi-KpYuTAZz__9KVl1jQz74bDG7i

baskaufs commented 1 year ago

Note: not all images with text are posters. For example, the Daumier prints have captions. We need to find out what the classification types would be in Wikidata and decide how to sort them out. I think right now they all are "print".

We also need to find out the property used to expose the text. "Caption"? "Inscription"?

baskaufs commented 10 months ago

Basically finished this with Emily's project fall 2023