Open glenrobson opened 7 years ago
hOCR also provides word confidence in the x_wconf value.
https://docs.google.com/document/d/1QQnIQtvdAC_8n92-LhwPcjtAUFwBlzE8EWnKAxlgVf0/preview
@glenrobson Yes, "WC" is used for "word confidence" in ALTO. Please note that there is an ongoing discussion with regard to how confidence values should be derived and expressed in future ALTO versions: https://github.com/altoxml/schema/issues/23.
Description
I am a harvester of IIIF content who would like to use the OCR word confidence in my index.
Variation(s)
Proposed Solutions
Some way of adding OCR word confidence from ALTO to IIIF Annotations.
Additional Background
This use case came up for Newspapers but I believe it is more widely applicable. Example Alto:
http://dams.llgc.org.uk/behaviour/llgc-id:3100022/fedora-sdef:alto/getAlto
and IIIF annotation list:
http://dams.llgc.org.uk/iiif/3100022/annotation/list/ART1.json
I believe WC is word confidence: