Closed mikegerber closed 4 years ago
Example workspace: actevedef_718448162.zip
ocrd workspace validate --page-coordinate-consistency off mets.xml
[...]
<error>INCONSISTENCY in Word ID 'l1130_word0020' of file 'OUTPUT_00000024': text results 'Notarus' != concatenated 'Notaris'</error>
The problematic glyph in OUTPUT_00000024:
<pc:Glyph id="l1130_word0020_glyph0005">
<pc:Coords points="1864,3587 1886,3587 1886,3646 1864,3646"/>
<pc:TextEquiv index="0" conf="0.6683819890022278">
<pc:Unicode>u</pc:Unicode>
</pc:TextEquiv>
<pc:TextEquiv index="1" conf="0.29328230023384094">
<pc:Unicode>i</pc:Unicode>
</pc:TextEquiv>
<!-- more indexes omitted -->
</pc:Glyph>
In get_text(), the
TextEquiv
withindex=1
is used if it exists. The way I read the documentation of theindex
attribute in the PAGE schema, it should use the one with the lowest index:
I concur. Thanks for reporting!
The lowest possible value for
index
is0
, according to the schema.
Yes, but that does not mean we have to look for index zero. Instead, we should abide by the wording of the schema more closely: taking the smallest index (whatever that may be).
Same goes for set_text
BTW.
I concur. Thanks for reporting!
Indeed. Cannot remember why we did it this way.
Also, along with the fix, we should rename page_textequiv_strategy=index1
to page_textequiv_strategy=first
. (There are not further references to index1
outside of that source file currently, except for the validate
CLI and the respective test.)
Yes, but that does not mean we have to look for index zero. Instead, we should abide by the wording of the schema more closely: taking the smallest index (whatever that may be).
Absolutely. In my current implementation in ocrd_calamari there could also be missing index
values (due to unrelated reasons), which should be perfectly legal.
In my current implementation in ocrd_calamari there could also be missing
index
values (due to unrelated reasons), which should be perfectly legal.
Right, but what if some processors add textequiv with index and some without? Then we can get a mix. Do we only sort by index if all alternatives possess one (and use XML element ordering otherwise), or do we use the smallest index if at least one alternative does?
@tboenig Can you remember why we implemented index1
instead ofan index0
strategy?
It does say so [in our PAGE specs]:
`@index` of the first (preferred) `<pg:TextEquiv>` must be the value 1.
I'm fairly certain I had a reason for that, could that be the convenion of Aletheia or TRANSKRIBUS?
In my current implementation in ocrd_calamari there could also be missing
index
values (due to unrelated reasons), which should be perfectly legal.Right, but what if some processors add textequiv with index and some without? Then we can get a mix. Do we only sort by index if all alternatives possess one (and use XML element ordering otherwise), or do we use the smallest index if at least one alternative does?
I would read the schema's description "Used for sort order in case multiple TextEquivs are defined. The text content with the lowest index should be interpreted as the main text content. " that index
should be used to define the order/precedence if there are multiple TextEquivs. If you're not using it (which seems to be legal), order is undefined.
(To be clear: With "missing index values" in my implementation I meant that there might be index="0"
and index="2"
but no index="1"
in some cases, but there is always a unique index value.)
It does say so [in our PAGE specs]:
`@index` of the first (preferred) `<pg:TextEquiv>` must be the value 1.
I'm fairly certain I had a reason for that, could that be the convenion of Aletheia or TRANSKRIBUS?
I have some files here that we're created using Aletheia, they only have "solo" TextEquivs with no index
attributes.
I would read the schema's description ... that
index
should be used to define the order/precedence if there are multiple TextEquivs. If you're not using it (which seems to be legal), order is undefined.
Of course, TextEquivs without index
would render the order undefined by the PAGE spec, but I was asking about opinions on what our implementation should prefer under these circumstance. As mentioned, we can easily get a mix of index/non-index textequivs.
but there is always a unique index value
No there is not: That attribute is optional, so the generateDS DOM (correctly) parses this as None
when absent.
could that be the convention of Aletheia or TRANSKRIBUS?
The convention in Aletheia is top-to-bottom starting from top-left bounding box based on LowLevelTextContainerImpl.java | GeometricObjectPositionComparator.java.
Unless a sequence is explicitely defined via readingOrder
, the increment of index*
is disregarded.
could that be the convention of Aletheia or TRANSKRIBUS?
The convention in Aletheia is top-to-bottom starting from top-left bounding box based on LowLevelTextContainerImpl.java | GeometricObjectPositionComparator.java.
Unless a sequence is explicitely defined via
readingOrder
, the increment ofindex*
is disregarded.
That's about the order of different segments, here it's about order of multiple TextEquivs for the same segment, e.g. multiple alternative predictions for the same glyph. For example this "u" where ocrd_calamari predicted an "i" alternatively, with less confidence:
<pc:Glyph id="l1130_word0020_glyph0005">
<pc:Coords points="1864,3587 1886,3587 1886,3646 1864,3646"/>
<pc:TextEquiv index="0" conf="0.6683819890022278">
<pc:Unicode>u</pc:Unicode>
</pc:TextEquiv>
<pc:TextEquiv index="1" conf="0.29328230023384094">
<pc:Unicode>i</pc:Unicode>
</pc:TextEquiv>
<!-- more indexes omitted -->
</pc:Glyph>
(I would prefer the word "precedence" over "order" as it seems less confusing.)
Well, while for String
the index
attribute is optional
, for GraphemeBaseType
it is required
with <restriction base="int">
and <minInclusive value="0"/>
. IIRC alternative predictions have only ever been considered on the character/glyph/grapheme level.
I chose to adhere to the stricter¹ OCR-D convention of starting with 1 for now (https://github.com/OCR-D/ocrd_calamari/commit/0f9c94e7dc4f4577ec1465a1cb0613d310941728).
¹ I also think it is needlessly stricter but I don't care that much to argue about it any longer :)
It does say so [in our PAGE specs]:
`@index` of the first (preferred) `<pg:TextEquiv>` must be the value 1.
I'm fairly certain I had a reason for that, could that be the convenion of Aletheia or TRANSKRIBUS?
I doubt that for Aletheia: text variants are definitely 0-indexed in prima-core-libs
.
As for Transkribus, I could not find usage of TextEquiv/@index
in our (various versions of) textual GT anywhere at all. (Not sure if that says anything).
Regarding the question where the idea of index1
(as opposed to index0
or first
) originated, here is my reconstruction:
index1
principle first appeared in the initial formulation in https://github.com/OCR-D/spec/pull/82 and implementation in https://github.com/OCR-D/core/pull/223 – there were discussions on alternative strategies (subsumption check, best
check) and on interfaces, but no one challenged index==1
I think #432 was closed by mistake?
Sorry about that, #432 is now in master.
@mikegerber This issue should have been fixed by #432 which is now in master. The initially mentioned error does not happen anymore, though there are still errors, also for that line:
<report>
<!-- ... -->
<error>INCONSISTENCY in TextLine ID 'l1130' of file 'OUTPUT_00000024': text results 'Der Schnltheiß zu Oberrod, der Wirth Krebs und Hr. Notarus Tribert ſind bereits' != concatenated 'Der Schnltheiß zu Oberrod , der Wirth Krebs und Hr . Notarus Tribert ſind bereits'</error>
<!-- ... -->
</report>
<error>INCONSISTENCY in TextLine ID 'l1130' of file 'OUTPUT_00000024': text results 'Der Schnltheiß zu Oberrod, der Wirth Krebs und Hr. Notarus Tribert ſind bereits' != concatenated 'Der Schnltheiß zu Oberrod , der Wirth Krebs und Hr . Notarus Tribert ſind bereits'</error>
This is an actual problem in the XML (generated with ocrd_calamari before 0.0.6), as the ,
and .
are separate words and therefore concatenate wrongly (= not according to OCR-D PAGE specs).
Thanks, closing then.
However, I do still get the "Notaris" error, among others, with master:
<error>INCONSISTENCY in Word ID 'l1130_word0020' of file 'OUTPUT_00000024': text results 'Notarus' != concatenated 'Notaris'</error>
@kba Could you please upload the full report you get from ocrd workspace validate --skip dimension --page-coordinate-consistency off
, I might have a problem with my environment and testing the wrong version.
I always make sure to run make install PIP_INSTALL="pip install -e"
in core to make sure core has been installed "editable".
ocrd workspace validate --skip dimension --page-coordinate-consistency off
17:32:59.037 INFO ocrd.page_validator - Validating input file 'OCR-D-GT-PAGE_00000024'
17:32:59.481 INFO ocrd.page_validator - Validating input file 'OUTPUT_00000024'
<report valid="false">
<error>INCONSISTENCY in TextLine ID 'l2159' of file 'OCR-D-GT-PAGE_00000024': text results 'eine ſo große Verwandſafft, daß ſo gar in legibus einem einigen Verbreen⸗ wie der Conſpirationi &' != concatenated 'eine ſo große einigen der Conſpirationi & wie in legibus einem Verwandſafft, daß ſo gar'</error>
<error>INCONSISTENCY in TextLine ID 'l19' of file 'OUTPUT_00000024': text results '[22]' != concatenated '[ 22 ]'</error>
<error>INCONSISTENCY in TextLine ID 'l32' of file 'OUTPUT_00000024': text results '[22' != concatenated '[ 22'</error>
<error>INCONSISTENCY in TextLine ID 'l1250' of file 'OUTPUT_00000024': text results 'ein gleiches vorgegeben, und ſo gar ſehr viele mahle gegen alle menſchliche Moͤglichkeit mit Gewalt tor-' != concatenated 'ein gleiches vorgegeben , und ſo gar ſehr viele mahle gegen alle menſchliche Moͤglichkeit mit Gewalt tor -'</error>
<error>INCONSISTENCY in TextLine ID 'l108' of file 'OUTPUT_00000024': text results 'ciret worden zu ſeyn, behaupten will, mithin nebſt dem Bredeka, welcher (§. 28. 29.) ſich in allen ſeinen' != concatenated 'ciret worden zu ſeyn , behaupten will , mithin nebſt dem Bredeka , welcher ( § . 28 . 29 . ) ſich in allen ſeinen'</error>
<error>INCONSISTENCY in TextLine ID 'l212' of file 'OUTPUT_00000024': text results 'Auſſagen wiederſprochen, mit der Pœna talſi um do gewiſſer zu belegen iſt, da' != concatenated 'Auſſagen wiederſprochen , mit der Pœna talſi um do gewiſſer zu belegen iſt , da'</error>
<error>INCONSISTENCY in TextLine ID 'l294' of file 'OUTPUT_00000024': text results 'ſecund. Fatin. Tit. 9. qu. 6. p . 320.' != concatenated 'ſecund . Fatin . Tit . 9 . qu . 6 . p . 320 .'</error>
<error>INCONSISTENCY in TextLine ID 'l361' of file 'OUTPUT_00000024': text results 'die Klage ſo wohl als das Zeignuͤß vos falſch und erdichtet muͤßen gehalten werden.' != concatenated 'die Klage ſo wohl als das Zeignuͤß vos falſch und erdichtet muͤßen gehalten werden .'</error>
<error>INCONSISTENCY in TextLine ID 'l446' of file 'OUTPUT_00000024': text results 'S. 35) So viel die von der Inquiſitin' != concatenated 'S . 35 ) So viel die von der Inquiſitin'</error>
<error>INCONSISTENCY in TextLine ID 'l2048' of file 'OUTPUT_00000024': text results 'rath mit einer Pœna fiſcali angeſehen worden, und ſolche durch des Hrn. Graffen von Koͤnigsfeld Vor⸗' != concatenated 'rath mit einer Pœna fiſcali angeſehen worden , und ſolche durch des Hrn . Graffen von Koͤnigsfeld Vor ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l99' of file 'OUTPUT_00000024': text results 'ſpruch, nur aus Gnaden nachgelaſſen erhalten.' != concatenated 'ſpruch , nur aus Gnaden nachgelaſſen erhalten .'</error>
<error>INCONSISTENCY in TextLine ID 'l149' of file 'OUTPUT_00000024': text results 'Sondern man hat auch dieſen 4. Wochen lang alle Abend bey der Inquiſitin gantz allein gelaſſen.' != concatenated 'Sondern man hat auch dieſen 4 . Wochen lang alle Abend bey der Inquiſitin gantz allein gelaſſen .'</error>
<error>INCONSISTENCY in TextLine ID 'l240' of file 'OUTPUT_00000024': text results 'Binnen welcher gantzer Zeit der Schreiber Bredeka beſtaͤndig bey Jhme geweſen, und ſich in' != concatenated 'Binnen welcher gantzer Zeit der Schreiber Bredeka beſtaͤndig bey Jhme geweſen , und ſich in'</error>
<error>INCONSISTENCY in TextLine ID 'l328' of file 'OUTPUT_00000024': text results 'der am 13ten Octobr. a. c. in fudicio gegen ſeinen geweſenen Hrn. intröducirter Appellation deſſen Bey⸗' != concatenated 'der am 13ten Octobr . a . c . in fudicio gegen ſeinen geweſenen Hrn . intröducirter Appellation deſſen Bey ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l431' of file 'OUTPUT_00000024': text results 'raths bedienet hat;' != concatenated 'raths bedienet hat ;'</error>
<error>INCONSISTENCY in TextLine ID 'l466' of file 'OUTPUT_00000024': text results '.z) Dabenebenſt iſt der Schreiber binnen dieſer gantzen Zeit auf freyem Fuß geblieben, und' != concatenated '. z ) Dabenebenſt iſt der Schreiber binnen dieſer gantzen Zeit auf freyem Fuß geblieben , und'</error>
<error>INCONSISTENCY in TextLine ID 'l563' of file 'OUTPUT_00000024': text results 'hat nicht nur durch ſeinen Coſuletten, ſondern auch, weilen der Inquiſitii ſelbſten in Jhrem Gefaͤngnuͤß' != concatenated 'hat nicht nur durch ſeinen Coſuletten , ſondern auch , weilen der Inquiſitii ſelbſten in Jhrem Gefaͤngnuͤß'</error>
<error>INCONSISTENCY in TextLine ID 'l663' of file 'OUTPUT_00000024': text results 'ſo viele Freyheit gelaſſen worden, daß ſie frembden Beſuch von Jhren Anverwandten ohngehindert em⸗' != concatenated 'ſo viele Freyheit gelaſſen worden , daß ſie frembden Beſuch von Jhren Anverwandten ohngehindert em ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l761' of file 'OUTPUT_00000024': text results 'pfangen koͤnnen, durch andere Perſonen ſich mit ihr uͤber alles, was Er oder ſie dereinſten zu ſagen hat⸗' != concatenated 'pfangen koͤnnen , durch andere Perſonen ſich mit ihr uͤber alles , was Er oder ſie dereinſten zu ſagen hat ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l868' of file 'OUTPUT_00000024': text results 'ten, vereinigen koͤnnen, immaſſen der Hofrath Senckenberg, als dieſer am 1. Octob. das Officium Jcdi-' != concatenated 'ten , vereinigen koͤnnen , immaſſen der Hofrath Senckenberg , als dieſer am 1 . Octob . das Officium Jcdi -'</error>
<error>INCONSISTENCY in TextLine ID 'l965' of file 'OUTPUT_00000024': text results 'cis gegen ihn zur ſatisfactione publica excitirete, vor ſich aber ratione injuriarum demſelben (eben § præced.' != concatenated 'cis gegen ihn zur ſatisfactione publica excitirete , vor ſich aber ratione injuriarum demſelben ( eben § præced .'</error>
<error>INCONSISTENCY in TextLine ID 'l1071' of file 'OUTPUT_00000024': text results 'geſagter maſſen) eine Leibes⸗Straͤffe aufzulegen bate, vor allen Dingen, gleich als ob Er ein peinlicher' != concatenated 'geſagter maſſen ) eine Leibes ⸗ Straͤffe aufzulegen bate , vor allen Dingen , gleich als ob Er ein peinlicher'</error>
<error>INCONSISTENCY in TextLine ID 'l1179' of file 'OUTPUT_00000024': text results 'Anklaͤger waͤre, und ohne indiciis denuneiiret haͤtte,' != concatenated 'Anklaͤger waͤre , und ohne indiciis denuneiiret haͤtte ,'</error>
<error>INCONSISTENCY in TextLine ID 'l1254' of file 'OUTPUT_00000024': text results 'deauf dieſem Fall inioid. Cr. art. 12. vom peinlichen Klaͤger erforderte' != concatenated 'deauf dieſem Fall inioid . Cr . art . 12 . vom peinlichen Klaͤger erforderte'</error>
<error>INCONSISTENCY in TextLine ID 'l1326' of file 'OUTPUT_00000024': text results 'Caution zu leiſten, auferleget worden, da man ſich doch ex Actis (vid. §. 31. haͤtte erſehen koͤnnen, daß' != concatenated 'Caution zu leiſten , auferleget worden , da man ſich doch ex Actis ( vid . § . 31 . haͤtte erſehen koͤnnen , daß'</error>
<error>INCONSISTENCY in TextLine ID 'l1427' of file 'OUTPUT_00000024': text results 'hier von einer ohnzweiffentlichen und offentlichen Miſſethat die Frage obwalte, wobey dem Richter' != concatenated 'hier von einer ohnzweiffentlichen und offentlichen Miſſethat die Frage obwalte , wobey dem Richter'</error>
<error>INCONSISTENCY in TextLine ID 'l1523' of file 'OUTPUT_00000024': text results 'in O. Cr. art. 16.' != concatenated 'in O . Cr . art . 16 .'</error>
<error>INCONSISTENCY in TextLine ID 'l1558' of file 'OUTPUT_00000024': text results 'in gantz anderer ex Officio anzuſtellender Proceß vorgeſchrieben wird und allenfalls, wenn uͤber die' != concatenated 'in gantz anderer ex Officio anzuſtellender Proceß vorgeſchrieben wird und allenfalls , wenn uͤber die'</error>
<error>INCONSISTENCY in TextLine ID 'l1654' of file 'OUTPUT_00000024': text results 'inlufficientia Iidiciorum ein Zweiffel obgewaltet haͤtte,' != concatenated 'inlufficientia Iidiciorum ein Zweiffel obgewaltet haͤtte ,'</error>
<error>INCONSISTENCY in TextLine ID 'l1722' of file 'OUTPUT_00000024': text results 'ſeeund. O Cr. art. 7.' != concatenated 'ſeeund . O Cr . art . 7 .'</error>
<error>INCONSISTENCY in TextLine ID 'l1758' of file 'OUTPUT_00000024': text results 'auswaͤrtige Rechtsgelaͤhrte haͤtten muͤſſen befraget werden, anſonſten aber bey der bloßen actione Injuria-' != concatenated 'auswaͤrtige Rechtsgelaͤhrte haͤtten muͤſſen befraget werden , anſonſten aber bey der bloßen actione Injuria -'</error>
<error>INCONSISTENCY in TextLine ID 'l1857' of file 'OUTPUT_00000024': text results 'rum dem Hofrath Senckenberg die Cautions Leiſtung um do weniger konnte auferleget werden, da ſolche' != concatenated 'rum dem Hofrath Senckenberg die Cautions Leiſtung um do weniger konnte auferleget werden , da ſolche'</error>
<error>INCONSISTENCY in TextLine ID 'l1956' of file 'OUTPUT_00000024': text results 'auch bey der Inhafftirung der Agricola von Jhm keinesweges ware erfordert worden.' != concatenated 'auch bey der Inhafftirung der Agricola von Jhm keinesweges ware erfordert worden .'</error>
<error>INCONSISTENCY in TextLine ID 'l2042' of file 'OUTPUT_00000024': text results '§ 34) Zwiſchen dem Crimine falſi und concuſſionis iſt' != concatenated '§ 34 ) Zwiſchen dem Crimine falſi und concuſſionis iſt'</error>
<error>INCONSISTENCY in TextLine ID 'l2097' of file 'OUTPUT_00000024': text results 'ſec. LAUTERB. Coll. Theot. Pract. Lib. 48. Tit. 10. §. 16.' != concatenated 'ſec . LAUTERB . Coll . Theot . Pract . Lib . 48 . Tit . 10 . § . 16 .'</error>
<error>INCONSISTENCY in TextLine ID 'l2159' of file 'OUTPUT_00000024': text results 'erne ſo große Verwandſchafft, daß ſo gar in legibus einem einigen Verrechen⸗wie der Conſpirationi &' != concatenated 'erne ſo große Verwandſchafft , daß ſo gar in legibus einem einigen Verrechen ⸗ wie der Conſpirationi &'</error>
<error>INCONSISTENCY in TextLine ID 'l2259' of file 'OUTPUT_00000024': text results 'ſubornationi Teſtium bald dieſer bald jenet Nahme beygeleget wird.' != concatenated 'ſubornationi Teſtium bald dieſer bald jenet Nahme beygeleget wird .'</error>
<error>INCONSISTENCY in TextLine ID 'l2330' of file 'OUTPUT_00000024': text results 'L. 2. de concuſſ I. t. der. Cornel. de fall.' != concatenated 'L . 2 . de concuſſ I . t . der . Cornel . de fall .'</error>
<error>INCONSISTENCY in TextLine ID 'l2384' of file 'OUTPUT_00000024': text results 'Da nun der Inquiſirin dieſes Crien allſchon voͤllig erwieſen worden (. 22.) und dieſelbe, wenn fie auch' != concatenated 'Da nun der Inquiſirin dieſes Crien allſchon voͤllig erwieſen worden ( . 22 . ) und dieſelbe , wenn fie auch'</error>
<error>INCONSISTENCY in TextLine ID 'l2482' of file 'OUTPUT_00000024': text results 'ohngeſtandenen falls zu einem wahren Zeugnuͤß ſuborniret haͤtte,' != concatenated 'ohngeſtandenen falls zu einem wahren Zeugnuͤß ſuborniret haͤtte ,'</error>
<error>INCONSISTENCY in TextLine ID 'l2556' of file 'OUTPUT_00000024': text results 'ſec. LATERs. Coll. Theor. Pract. L. 48. T. 10. §. 8.' != concatenated 'ſec . LATERs . Coll . Theor . Pract . L . 48 . T . 10 . § . 8 .'</error>
<error>INCONSISTENCY in TextLine ID 'l2612' of file 'OUTPUT_00000024': text results 'dennoch mit der pœna falſi, als falſum fieri curans,' != concatenated 'dennoch mit der pœna falſi , als falſum fieri curans ,'</error>
<error>INCONSISTENCY in TextLine ID 'l2670' of file 'OUTPUT_00000024': text results 'ſec. l. 0. 6. 3. ad L. Corn. de fali.' != concatenated 'ſec . l . 0 . 6 . 3 . ad L . Corn . de fali .'</error>
<error>INCONSISTENCY in TextLine ID 'l2714' of file 'OUTPUT_00000024': text results 'L.4. 8. C. e. 7 X. de fali.' != concatenated 'L . 4 . 8 . C . e . 7 X . de fali .'</error>
<error>INCONSISTENCY in TextLine ID 'l25' of file 'OUTPUT_00000024': text results 'muͤßte beleget werden,/ welche dann oben (§. 3i) geſagter maſſen die Straffe der Enthauptung iſt/ wie viel⸗' != concatenated 'muͤßte beleget werden , / welche dann oben ( § . 3i ) geſagter maſſen die Straffe der Enthauptung iſt / wie viel ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l2860' of file 'OUTPUT_00000024': text results 'mehr wird derſelben und Jhrem Complici Bredekaw dieſe Straffe angedeyhen muͤſſen, da dieſelbe extra' != concatenated 'mehr wird derſelben und Jhrem Complici Bredekaw dieſe Straffe angedeyhen muͤſſen , da dieſelbe extra'</error>
<error>INCONSISTENCY in TextLine ID 'l2960' of file 'OUTPUT_00000024': text results 'Judicium beſtaͤndig behauptet, daß ſie der Hofrath Senckenberg mit Gewalt⸗und ſo gar it Piſtolen zu' != concatenated 'Judicium beſtaͤndig behauptet , daß ſie der Hofrath Senckenberg mit Gewalt ⸗ und ſo gar it Piſtolen zu'</error>
<error>INCONSISTENCY in TextLine ID 'l3060' of file 'OUTPUT_00000024': text results 'ſeinem Willen gezwungen,' != concatenated 'ſeinem Willen gezwungen ,'</error>
<error>INCONSISTENCY in TextLine ID 'l3102' of file 'OUTPUT_00000024': text results 'Protoc. Inquiſ. fol. 71. b. fol73. b. 82. a. b. fol. 23. a.' != concatenated 'Protoc . Inquiſ . fol . 71 . b . fol73 . b . 82 . a . b . fol . 23 . a .'</error>
<error>INCONSISTENCY in TextLine ID 'l3168' of file 'OUTPUT_00000024': text results 'auch in Judicio,' != concatenated 'auch in Judicio ,'</error>
<error>INCONSISTENCY in TextLine ID 'l50' of file 'OUTPUT_00000024': text results 'antzegebene Zeugin belanget, ſo muß zwar, ſo viel Teſt. 1. neml. des aͤltern Hx. Burgermeiſters hoch⸗' != concatenated 'antzegebene Zeugin belanget , ſo muß zwar , ſo viel Teſt . 1 . neml . des aͤltern Hx . Burgermeiſters hoch ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l92' of file 'OUTPUT_00000024': text results 'wohlgebl. anbetrifft, der Hofrath Senckenberg zu ſeinem groͤßten Leidweeſen bekennen, daß Er dieſelbe,' != concatenated 'wohlgebl . anbetrifft , der Hofrath Senckenberg zu ſeinem groͤßten Leidweeſen bekennen , daß Er dieſelbe ,'</error>
<error>INCONSISTENCY in TextLine ID 'l189' of file 'OUTPUT_00000024': text results '(nach Veranlaſſung§. 16. 17. 18. 19.) vor einen Inimicum angeben muͤße, woferne jedoch annoch ein Pro⸗' != concatenated '( nach Veranlaſſung § . 16 . 17 . 18 . 19 . ) vor einen Inimicum angeben muͤße , woferne jedoch annoch ein Pro ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l287' of file 'OUTPUT_00000024': text results 'ceß gegen den Hofrath Senckenberg ſtatt haben koͤnnte, und nicht' != concatenated 'ceß gegen den Hofrath Senckenberg ſtatt haben koͤnnte , und nicht'</error>
<error>INCONSISTENCY in TextLine ID 'l350' of file 'OUTPUT_00000024': text results 'contra Q Cr. art. 100.' != concatenated 'contra Q Cr . art . 100 .'</error>
<error>INCONSISTENCY in TextLine ID 'l399' of file 'OUTPUT_00000024': text results 'wie ſonſten hier gewoͤhnlich, articuli impertinentes oder dergleichen Interrogatoria zugelaſſen/ auch die von' != concatenated 'wie ſonſten hier gewoͤhnlich , articuli impertinentes oder dergleichen Interrogatoria zugelaſſen / auch die von'</error>
<error>INCONSISTENCY in TextLine ID 'l577' of file 'OUTPUT_00000024': text results 'ſec. cap. accedens 23. X. de accus.' != concatenated 'ſec . cap . accedens 23 . X . de accus .'</error>
<error>INCONSISTENCY in TextLine ID 'l625' of file 'OUTPUT_00000024': text results 'nichr zugelaſſen wird, duͤrfften dieſelbe vielleicht um do ehender vernommen werden, weilen alles ohne⸗' != concatenated 'nichr zugelaſſen wird , duͤrfften dieſelbe vielleicht um do ehender vernommen werden , weilen alles ohne ⸗'</error>
<error>INCONSISTENCY in TextLine ID 'l717' of file 'OUTPUT_00000024': text results 'hin. ex Originaiibus zu erweiſen ſtehet.' != concatenated 'hin . ex Originaiibus zu erweiſen ſtehet .'</error>
<error>INCONSISTENCY in TextLine ID 'l782' of file 'OUTPUT_00000024': text results '§. 36) Was von dem Bredekaw, der Seitzin und deren Sohn zu halten, iſt oben (s. 25. 26. 27.' != concatenated '§ . 36 ) Was von dem Bredekaw , der Seitzin und deren Sohn zu halten , iſt oben ( s . 25 . 26 . 27 .'</error>
<error>INCONSISTENCY in TextLine ID 'l875' of file 'OUTPUT_00000024': text results '28.) erinnert worden.' != concatenated '28 . ) erinnert worden .'</error>
<error>INCONSISTENCY in TextLine ID 'l926' of file 'OUTPUT_00000024': text results 'Mein Laquays Græf darff, wann gegen mich annoch ein Proceß ſtatt hatte, mmerhin verhoͤhret' != concatenated 'Mein Laquays Græf darff , wann gegen mich annoch ein Proceß ſtatt hatte , mmerhin verhoͤhret'</error>
<error>INCONSISTENCY in TextLine ID 'l1012' of file 'OUTPUT_00000024': text results 'werden.' != concatenated 'werden .'</error>
<error>INCONSISTENCY in TextLine ID 'l1053' of file 'OUTPUT_00000024': text results 'Die Wagnerin und deren Mann haben allſchon gegen die Inquiſitin ausgeſagt.' != concatenated 'Die Wagnerin und deren Mann haben allſchon gegen die Inquiſitin ausgeſagt .'</error>
<error>INCONSISTENCY in TextLine ID 'l1130' of file 'OUTPUT_00000024': text results 'Der Schnltheiß zu Oberrod, der Wirth Krebs und Hr. Notarus Tribert ſind bereits' != concatenated 'Der Schnltheiß zu Oberrod , der Wirth Krebs und Hr . Notarus Tribert ſind bereits'</error>
<error>INCONSISTENCY in TextLine ID 'l1203' of file 'OUTPUT_00000024': text results 'abgehoͤret.' != concatenated 'abgehoͤret .'</error>
<notice>fileGrp USE does not begin with 'OCR-D-': OUTPUT</notice>
</report>
Your results are as expected (only whitespace-related problems from the XML itself).
(I have an issue that I – unexpectingly – wasn't calling the version from the virtualenv but the buggy installation in ~/.local
instead.) Calling the correct version I get the same report as yours, so the bug is fixed.
In get_text(), the
TextEquiv
withindex=1
is used if it exists. The way I read the documentation of theindex
attribute in the PAGE schema, it should use the one with the lowest index:The lowest possible value for
index
is0
, according to the schema.