Open wollmers opened 4 years ago
Compared original XML "ONB_newseye" to current line texts "AustrianNewspapers".
compare_xml.pl Version 0.01 Compare XML text output against ground truth (GRT): XML: ONB_newseye GRT: AustrianNewspapers Summary: lines words chars items ocr: 57541 326524 2198240 matches + inserts + substitutions items grt: 57541 326394 2198051 matches + deletions + substitutions matches: 23961 265356 2125325 matches edits: 33580 61346 73806 inserts + deletions + substitutions subss: 33580 60860 71835 substitutions inserts: 0 308 1080 inserts deletions: 0 178 891 deletions precision: 0.4164 0.8127 0.9668 matches / (matches + substitutions + inserts) recall: 0.4164 0.8130 0.9669 matches / (matches + substitutions + deletions) accuracy: 0.4164 0.8122 0.9664 matches / (matches + substitutions + inserts + deletions) f-score: 0.4164 0.8128 0.9669 ( 2 * recall * precision ) / (recall + precision )
Shortened list of the edits/mismatches:
Character match (confusion) table: GRT => OCR ratio errors count --- --- ------ ------- ------- 'ſ' => 's' 0.9985 56885 56971 '⸗' => '-' 0.0052 61 11639 '⸗' => '=' 0.3232 3762 11639 '⸗' => '¬' 0.6691 7788 11639 ----- SUM 68496 + transcription 1000 estimated transcription level 1 -> 2 ----- TOTAL transcription 69496 edits 73806 - transcription -69496 ----- corrections 4310 (0,20% of all characters)
Rough guess of errors still in the GRT: 1000 - 2000.
Compared original XML "ONB_newseye" to current line texts "AustrianNewspapers".
Shortened list of the edits/mismatches:
Rough guess of errors still in the GRT: 1000 - 2000.