TEI4HTR / page2tei

A repository for illustrating the transformation of a PAGE XML file into XML-TEI format, resulting from experimentations made for the LECTAUREP project.
Creative Commons Attribution 4.0 International
15 stars 2 forks source link

Deleting <surfaceGrp> and only keeping <surface> as text regions? #4

Closed HugoSchtr closed 3 years ago

HugoSchtr commented 3 years ago

After re-reading the TEI guidelines, a <surfaceGrp> groups several written surfaces. We're currently using both <surfaceGrp> and <surface> for representing only one text region, resulting in a group of one <surface>:

<surfaceGrp xml:id="eSc_textblock_afbab800" type="structure_{type:col_1;}">
         <surface points="421,615 421,2236 465,2211 465,2266 421,2269 425,2449 410,4148 362,4213 205,4228 234,615">
            <zone xml:id="eSc_line_86b00a8e"
                  type="mask"
                  points="285,838 293,812 322,798 380,801 377,863 289,874">
               <path type="baseline" points="289,841 389,845"/>
               <line>198</line>
            </zone>
            <zone xml:id="eSc_line_4218ebcd"
                  type="mask"
                  points="278,981 285,940 311,929 380,948 384,992 359,1028 318,1028 282,1006">
               <path type="baseline" points="278,981 384,992"/>
               <line>199</line>
            </zone>
            ...
           </surface>
 <surfaceGrp xml:id="eSc_textblock_c6e3bb97" type="structure_{type:col_3;}">
    <surface points="934,612 890,4216 772,4228 577,4207 615,615">
       <zone xml:id="eSc_line_c5f75194" type="mask" points="608,841 611,750 630,743 871,750 897,728 897,867 611,863">
          <path type="baseline" points="611,845 703,838 910,840"/>
          <line>Procuration</line>
       </zone>
          ...

Maybe a <surface> element alone, still grouping one or more <zone> representing baselines, is more appropriate and less redundant for representing a text region.

See documentation: https://tei-c.org/release/doc/tei-p5-doc/en/html/ref-surface.html

<surface xml:id="eSc_textblock_afbab800"
               type="structure_{type:col_1;}"
               points="421,615 421,2236 465,2211 465,2266 421,2269 425,2449 410,4148 362,4213 205,4228 234,615">
         <zone xml:id="eSc_line_86b00a8e"
               type="mask"
               points="285,838 293,812 322,798 380,801 377,863 289,874">
            <path type="baseline" points="289,841 389,845"/>
            <line>198</line>
         </zone>
         ...
</surface
<surface xml:id="eSc_textblock_c6e3bb97"
               type="structure_{type:col_3;}"
               points="934,612 890,4216 772,4228 577,4207 615,615">
         <zone xml:id="eSc_line_c5f75194"
               type="mask"
               points="608,841 611,750 630,743 871,750 897,728 897,867 611,863">
            <path type="baseline" points="611,845 703,838 910,840"/>
            <line>Procuration</line>
         </zone>