Open cboulanger opened 2 years ago
Added a try/except to work around this issue. It shows that the bug is caused by malformed annotations (see below). The fix simply ignores the malformed lines, which might be the only appropriate solution.
Segmentation training [###.............................] 35/320: 0:00:24 remaining...
16563.xml: problem parsing <author><surname>Weber <author><given-names>Max </surname></author></given-names></author>(</author><year>1988</year><author>c/ Orig. </author><year>1920</ye
ar><author>) <title>Gesammelte Aufsätze zur Religionssoziologie I</author>. <other>Tübingen</title>.</other>
Segmentation training [#######.........................] 71/320: 0:00:23 remaining...
20786.xml: problem parsing <author><surname>Schnell</surname>,<given-names> R.</given-names></author>, <year>1997</year>: <title>Nonresponse in Bevölkerungsumfragen. Ausmaß, Entwicklun
g und Ursachen</title>. <other>Opladen<other>: <publisher>Leske + Budrich.</publisher></other></other>
Segmentation training [#######.........................] 77/320: 0:00:14 remaining...
21690.xml: problem parsing <source>Working Brief</source> <volume>15</volume>: <author><given-names>Diego</given-names> <surname>Compagna / <author><given-names>Stefan</surname> <surna
me>Derpmann</surname></author></given-names></author> / <author><given-names>Kathrin</given-names> <surname>Mauz</surname></author> / <author><given-names>Karen</given-names> <surname>
Shire</surname></author> (<year>2009</year>): <title>Förderung des Wissenstransfers für eine aktive Mitgestaltung des Pflegesektors durch Mikrosystemtechnik (WiMi-Care)</title>, <sourc
e>Working Brief</source> <volume>15</volume>: <title>Die Einstellung von Pflegekräften gegenüber technischen Neuerungen</title>. In: <url>http://www.wimi-care.de/outputs.html#Briefs</u
rl> (letzter Abruf: <other>02.12.2009</other>).
Segmentation training [##################..............] 188/320: 0:00:13 remaining...
36684.xml: problem parsing <title>Stellungnahmen geladener Sachverständiger vor dem Bundestag zum Thema Fiskalpakt und ESM</title>, <other>7.5.</other><year>2012</year>: <url><www. bun
destag.de/bundestag/ausschuesse17/a08/anhoerungen/fiskalpakt_und_esm/stellungnahmen/index.html/></url>.
Segmentation training [######################..........] 225/320: 0:00:10 remaining...
40723.xml: problem parsing <author><surname>Koskinas</surname></author>, <author><given-names>Ioannis </given-names></author>(<year>2014</year>),<title> The Only Choice Left for Afghan
istan</title>, online: <url>htp://southasia.foreign-policy.com/posts/2014/09/11/the_only_choice_ left_for_afghanistan></url> (<other>27 October 2014</other>).
Segmentation training [##########################......] 260/320: 0:00:05 remaining...
45841.xml: problem parsing <editor>Folha Online</editor> (<year>2012</year>), <url><www1.folha.uol.com.br/fsp/brasil/></url> (<other>12. November 2012</other>).
45841.xml: problem parsing <author><surname>Patarra</surname>, <given-names>Ivo</given-names></author> (<year>2010</year>), <title>O chefe</title>, online: <url><www.escandalodomensala
o.com.br></url> (<other>2. November 2012</other>).
45841.xml: problem parsing <editor>Veja</editor> (<year>2012</year>), <title>O Julgamento do Mensalão. A hora da Sentença</title>, online: <url><htp://veja.abril.com.br/o-jul - gamento
-do-mensalao/hora-da-sentenca/></url> (<other>13. November 2012</other>).