Closed HugoSchtr closed 2 years ago
However, as stated in issue #1, since the points attribute requires at least 3 x,y pairs, we are currently non-TEI compliant.
Issue #1 is resolved, here's the new transformation for a baseline page2tei:
<TextRegion id="eSc_textblock_afbab800" custom="structure {type:col_1;}">
<Coords points="421,615 421,2236 465,2211 465,2266 421,2269 425,2449 410,4148 362,4213 205,4228 234,615"/>
<TextLine id="eSc_line_86b00a8e" >
<Coords points="285,838 293,812 322,798 380,801 377,863 289,874"/>
<Baseline points="289,841 389,845"/>
<TextEquiv>
<Unicode>198</Unicode>
</TextEquiv>
</TextLine>
<TextLine id="eSc_line_4218ebcd" >
<Coords points="278,981 285,940 311,929 380,948 384,992 359,1028 318,1028 282,1006"/>
<Baseline points="278,981 384,992"/>
<TextEquiv>
<Unicode>199</Unicode>
</TextEquiv>
</TextLine>
...
becomes:
<surfaceGrp xml:id="eSc_textblock_afbab800" type="structure_{type:col_1;}">
<surface points="421,615 421,2236 465,2211 465,2266 421,2269 425,2449 410,4148 362,4213 205,4228 234,615">
<zone xml:id="eSc_line_86b00a8e"
type="mask"
points="285,838 293,812 322,798 380,801 377,863 289,874">
<path type="baseline" points="289,841 389,845"/>
<line>198</line>
</zone>
<zone xml:id="eSc_line_4218ebcd"
type="mask"
points="278,981 285,940 311,929 380,948 384,992 359,1028 318,1028 282,1006">
<path type="baseline" points="278,981 384,992"/>
<line>199</line>
</zone>
...
New version of the transformation now includes regions' coordinates from the page XML in the TEI with the <surface>
element and its attribute points
.
In the second version of the XSL, transformations (from PAGE XML to TEI) proceed as such:
For metadata:
becomes:
For the transcription itself:
becomes:
Every
<TextRegion>
and every baseline (masks and baselines):becomes: