These String related attributes can be used to describe human based decisions/actions during the OCR text correction process:
• ILLS(boolean, optional): specify if a word is illegible in the source document (and consequently can't be corrected). This status can be used:
during the production workflow (the control quality process needs to know if a specific word is part or not of the guaranteed text quality perimeter ; besides, this status informs that the provider made a manual task on the word)
by the viewing software: end users should be informed that some words are illegible in the source document itself (it's not an OCR error...)
• DBTS(boolean, optional): specify that a word has been corrected but a doubt remains. Same use cases.
• These two attributes are part of the "production family" attributes, with CS (Correction Status), already defined by the schema.
Remarks: ILLS could be useful on the TextBlock/TextLine types too:
areas of the page with physical defaults: stains, blur, etc.
areas of the page with scan defaults: curvature near the binding, missing blocks near the margins, etc.
These attributes must be defined with a recommendation: always use the highest level possible to set the attribute (ie: do not set an attribute on all the sub-elements).
<xsd:attribute name="ILLS" type="xsd:boolean" use="optional">
<xsd:annotation >
<xsd:documentation>The word is illegible in the source document and can't be manually corrected. If the content owner thinks the word is legible, the attribute must be dropped (ILLS="false" is not recommended)< /xsd:documentation >
</xsd:annotation >
</xsd:attribute>
<xsd:attribute name="DBTS" type="xsd:boolean" use="optional">
<xsd:annotation >
<xsd:documentation>The word has been manually corrected but a doubt remains. If the content owner thinks the doubt is not legimitate, the attribute must be dropped (DBTS="false" is not recommended).< /xsd:documentation >
</xsd:annotation >
</xsd:attribute>
Use cases:
These String related attributes can be used to describe human based decisions/actions during the OCR text correction process: • ILLS (boolean, optional): specify if a word is illegible in the source document (and consequently can't be corrected). This status can be used:
• DBTS (boolean, optional): specify that a word has been corrected but a doubt remains. Same use cases. • These two attributes are part of the "production family" attributes, with CS (Correction Status), already defined by the schema.
Remarks: ILLS could be useful on the TextBlock/TextLine types too:
These attributes must be defined with a recommendation: always use the highest level possible to set the attribute (ie: do not set an attribute on all the sub-elements).
Examples:
Schema change: