dariok / page2tei

MIT License
25 stars 21 forks source link

missing attributes #20

Open hackmanschorsch opened 3 years ago

hackmanschorsch commented 3 years ago

<TextLine id="r1l3" custom="readingOrder {index:2;} datum {offset:4; length:13;datum:1696-02-15;} persoonsnaam {offset:33; length:12;continued:true;}">`

results in the next line - without the property/attibute 'datum'

<lb facs="#facs_2_r1l3" n="N003"/>die <datum>15 febr. 1696</datum> gehuwd was met <persoonsnaam>Aletta Catha</persoonsnaam>

hackmanschorsch commented 3 years ago

This was also observed for other tags, e.g. persoonsnaam. Examples with source files can be found in Transkribus - CollID 76546

elespdn commented 2 years ago

Hello, here at RISE we are also using these wonderful stylesheets to export from Transkribus in TEI.

We now have the same issue described here: the attributes in the page xml source are not rendered in the TEI output. For example:

<TextLine id="r1l7" custom="readingOrder {index:6;} hi {offset:0; length:1;rend:ornamentalInitial;}">
    <Coords points="358,852 941,869 962,849 1128,857 1138,811 358,803"/>
    <Baseline points="365,838 409,839 453,840 497,841 541,842 585,842 629,843 673,844 717,845 761,845 805,846 849,846 893,847 937,847 981,847 1025,847 1069,847 1131,850"/>
    <TextEquiv>
        <Unicode>En principio criò ...</Unicode>
    </TextEquiv>
</TextLine>

creates

<lb facs="#facs_1_r1l7" n="N007"/><hi>E</hi>n principio criò ...

where the attributes @rend and its value are missing.

Could someone suggest a strategy to add this to the transformation? The stylesheet is very complex, I guess additional rules should go where @custom is parsed to produce the text (https://github.com/dariok/page2tei/blob/master/page2tei-0.xsl#L520), but I haven't been able to fix it.

dariok commented 2 years ago

Sorry for the long wait! Too much to do…

I will try to have a look into this problem in the next 2–3 weeks as this has arisen in other projects, too. Thanks for your examples!

elespdn commented 2 years ago

Thanks a lot @dariok !

Afterall there are also workarounds, one could create adhoc tags in Transkribus and then replace them with tag+attribute after the export ..

But should you need other examples or contribution, do let me know.

elespdn commented 2 years ago

I've heard that there are developments on the TEI export from Transkribus at the Biblioteca Hertziana, and I think @liladude is the specialist there, I've seen some of her talks! Maybe joining forces is possible? And https://github.com/eeditiones would like to develop better integration from Transkribus to TEI Publisher too, thus going through a TEI export.

liladude commented 2 years ago

Thanks @elespdn for the kind words and sorry for not noticing earlier, we are exactely tryng to link Transkribus + PAGE2TEI + TEI Publisher. Hopefully we will be able to join forces!