gerritbruening / texthist

Sample data, code, ideas, questions
1 stars 0 forks source link

DTA's lb into proper verse lines #3

Open gerritbruening opened 5 years ago

gerritbruening commented 5 years ago

According to their docu, the DTA doesn't render verse lines in drama as l, but as lb within p, for example:

<sp who="#PRI">
    <speaker><hi rendition="#g"
        >Prinzeſſi<supplied>n</supplied>n</hi>.</speaker>
    <lb/>
    <p><hi rendition="#in">D</hi>u ſiehſt mich lächlend an,
        Eleonore,<lb/> Und ſiehſt dich ſelber an und
        lächelſt wieder.<lb/> Was haſt du? Laß es eine
        Freundinn wiſſen!<lb/> Du ſcheinſt
        bedenklich, doch du ſcheinſt ver-<lb/> gnügt.</p>
</sp>

(http://www.deutschestextarchiv.de/dtaq/book/view/goethe_torquato_1790?p=11&view=)

For a proper TEI encoding, one would like to have

<l><hi rendition="#in">D</hi>u ſiehſt mich lächlend an, Eleonore,</l>

etc. This was discussed with @cthomasdta from the DTA team. Maybe @mathias-goebel can give some advice, as he has worked on this kind of data. Note that not all lbs do indicate that there is a new verse line beginning: ver-<lb/> gnügt is just due to hyphenation. Note also that break="no" is missing although one could expect it since it's use is recommended. tl;dr:-)

gerritbruening commented 4 years ago

Kein Privatproblem, daher: https://github.com/deutschestextarchiv/dtabf/issues/83