Open nfreire opened 6 years ago
@nfreire thanks for having created this. I am puzzled about the elicited need in the end of the description, though, as well as in the title: our draft spec at https://docs.google.com/document/d/1t5yGEzQ0KV2rqU0sFDoKnI2bIDBGrmj0f1gSOCRUgJ4/ mentions that we should have "word", "line", "paragraph" and "page".
Description
Europeana aggregates full-text resulting from OCR, from data providers that apply different practices to the OCR processing. The post OCR processing is also applied differently across data providers. The aggregate full-text from Europeana has also been subject of research for allowing its processing in research infrastructures for language resources (CLARIN, most importantly), and in the near future, the results from the application of language tools that improve the structure of the full-text, may be provided to Europeana, by researchers from these infrastructures.
The Europeana Data Model is being extended to allow the representation of full-text in a compatible way to the IIIF Presentation API v3, using Web Annotations. Therefore the need for the use of a common vocabulary for representing the type of text blocks, compatible with both specifications.