UUDigitalHumanitieslab / tei_reader

TEI Reader Python Library
MIT License
15 stars 2 forks source link

Missing linefeeds in string output #6

Open mikkelee opened 5 years ago

mikkelee commented 5 years ago

I am attempting to extract text from the Danish Grundtvig corpus available here: http://hdl.handle.net/20.500.12115/31

There seems to be a problem with poetry missing linefeeds. The below from 1806_52.xml resulted in fused words, for example forkynderGrumtbølgende

<lg>
<l>Øde og Angest,</l>
<l>Fængsel og Utaal</l>
<l>Smertens de bittre</l>
<l>Taarer forøge.</l>
<l>Sid du paa Sædet!</l>
<l>Men jeg dig forkynder</l>
<l>Grumtbølgende Smerte,</l>
<l>Og dobbelte Jammer.</l>
</lg>
emanjavacas commented 3 years ago

Did you get to solve this issue?

mikkelee commented 3 years ago

Unfortunately not yet.