Open kosloot opened 4 years ago
This came up after issue #45
when resolving a HEMP, FoLiA-correct just adds the resolved text to one of the string/word nodes. I assume using a real Correction would be better.
Correction
for example:
<p xml:id="mwsel.p.1"> <t class="OCR">•c c•</t> <str xml:id="mwsel.p.1.str.1"> <t class="OCR">•c</t> </str> <str xml:id="mwsel.p.1.str.2"> <t class="OCR">c•</t> </str> </p>
assuming •c c• is in the PUNCT file as •c c• cc this HEMP is resolved as:
•c c•
•c c• cc
<p xml:id="mwsel.p.1"> <t>cc</t> <t class="OCR">•c c•</t> <str xml:id="mwsel.p.1.str.1"> <t class="OCR">•c</t> </str> <str xml:id="mwsel.p.1.str.2"> <t offset="0">cc</t> <t class="OCR">c•</t> </str> </p>
IMHO a much better solution would be:
<p xml:id="mwsel.p.1"> <t>cc</t> <t class="OCR">•c c•</t> <correction xml:id="mwsel.p.1.correction.1"> <new> <str xml:id="mwsel.p.1.str.edit.1"> <t >cc</t> </str> </new> <original> <str xml:id="mwsel.p.1.str.1"> <t class="OCR">•c</t> </str> <str xml:id="mwsel.p.1.str.2"> <t class="OCR">c•</t> </str> </original> </correction> </p>
interesting point: HEMP resolution is done before other corrections. I assume that a real correction using the cc will not be performed.
cc
This came up after issue #45
when resolving a HEMP, FoLiA-correct just adds the resolved text to one of the string/word nodes. I assume using a real
Correction
would be better.for example:
assuming
•c c•
is in the PUNCT file as•c c• cc
this HEMP is resolved as:IMHO a much better solution would be:
interesting point: HEMP resolution is done before other corrections. I assume that a real correction using the
cc
will not be performed.