Open mikkelee opened 5 years ago
I am attempting to extract text from the Danish Grundtvig corpus available here: http://hdl.handle.net/20.500.12115/31
There seems to be a problem with poetry missing linefeeds. The below from 1806_52.xml resulted in fused words, for example forkynderGrumtbølgende
1806_52.xml
<lg> <l>Øde og Angest,</l> <l>Fængsel og Utaal</l> <l>Smertens de bittre</l> <l>Taarer forøge.</l> <l>Sid du paa Sædet!</l> <l>Men jeg dig forkynder</l> <l>Grumtbølgende Smerte,</l> <l>Og dobbelte Jammer.</l> </lg>
Did you get to solve this issue?
Unfortunately not yet.
I am attempting to extract text from the Danish Grundtvig corpus available here: http://hdl.handle.net/20.500.12115/31
There seems to be a problem with poetry missing linefeeds. The below from
1806_52.xml
resulted in fused words, for example forkynderGrumtbølgende