altoxml / schema

ALTO XML schema - latest and all former versions
51 stars 4 forks source link

OCR correction attributes: CS, ILLS, DBTS #21

Open jpmoreux opened 10 years ago

jpmoreux commented 10 years ago

Use cases:

These String related attributes can be used to describe human based decisions/actions during the OCR text correction process: • ILLS (boolean, optional): specify if a word is illegible in the source document (and consequently can't be corrected). This status can be used:

DBTS (boolean, optional): specify that a word has been corrected but a doubt remains. Same use cases. • These two attributes are part of the "production family" attributes, with CS (Correction Status), already defined by the schema.

Remarks: ILLS could be useful on the TextBlock/TextLine types too:

These attributes must be defined with a recommendation: always use the highest level possible to set the attribute (ie: do not set an attribute on all the sub-elements).

Examples:

<String ID="PAG_00000001_ST000029" STYLEREFS="TXT_1" HPOS="3413" VPOS="296" HEIGHT="448" WIDTH="992" WC="0.34" ILLS="true" CONTENT="AnfûràoII"/>

<String ID="PAG_00000001_ST000029" STYLEREFS="TXT_1" HPOS="3413" VPOS="296" HEIGHT="448" WIDTH="992" WC="0.34" DBTS="true" CONTENT="droits"/> 

Schema change:

<xsd:attribute name="ILLS" type="xsd:boolean" use="optional"> 
 <xsd:annotation > 
  <xsd:documentation>The word is illegible in the source document and can't be manually corrected. If the content owner thinks the word is legible, the attribute must be dropped (ILLS="false" is not recommended)< /xsd:documentation  > 
 </xsd:annotation  > 
</xsd:attribute>
<xsd:attribute name="DBTS" type="xsd:boolean" use="optional">  
 <xsd:annotation >
   <xsd:documentation>The word has been manually corrected but a doubt remains. If the content owner thinks the doubt is not legimitate, the attribute must be dropped  (DBTS="false" is not recommended).< /xsd:documentation   >  
 </xsd:annotation >
</xsd:attribute> 
cowboyMontana commented 9 years ago

Changed label from 'submitted' to 'discussion'.

cowboyMontana commented 9 years ago

Assigned Jean Philippe Moreux as change request champion