PolMine / GermaParl2

GermaParl corpus of plenary protocols (v2)
0 stars 0 forks source link

Wrong attribution of party in Session #1 1949 #2

Closed ablaette closed 6 months ago

ablaette commented 1 year ago

The first speaker in the first session of the first legislative period should be a CDU speaker. But running this kwic query, we get a SPD speaker.

library(polmineR)

corpus("GERMAPARL2") %>% 
  subset(protocol_date == "1949-09-07") %>% 
  kwic(query = "Zukunft", s_attributes = c("protocol_date", "speaker_party"))

Yielding the result:

protocol_date speaker_party left node right
1949-09-07 SPD Gesetz unseres gesetzgeberischen Handelns in Zukunft sein . Geistige und politische
1949-09-07 SPD für eine glücklichere Entwicklung der Zukunft schöpfen wird . Lassen Sie

This is the XML of the protocol: https://github.com/PolMine/GermaParlTEI/blob/main/01/BT_01_001.xml

This is the pdf: https://dserver.bundestag.de/btp/01/01001.pdf

The term "Zukunft" ist part of a speech of "Präsident Dr. Köhler".

This is the Wikipedia entry for Erich Köhler (CDU): https://de.wikipedia.org/wiki/Erich_Köhler#:~:text=Erich%20Köhler%20(*%2027.,erster%20Präsident%20des%20Deutschen%20Bundestages.

In the XML, we can see this line: Dr. Köhler is not recognized here as a speaker: https://github.com/PolMine/GermaParlTEI/blob/d81fdf431efec3dbc0fb007993b9c936a84d1600/01/BT_01_001.xml#L175

ChristophLeonhardt commented 6 months ago

As indicated in the issue, the problem is not that the speaker is attributed with the wrong party affiliation, but that the speaker is not recognized at all. In consequence, the speech is attributed to the previous speaker.

This issue has been addressed in the new version of GermaParl, GermaParl v2.0.1.