Open PolMine opened 4 years ago
I am not sure if the speech isn't recognized. I would say, it is. Maria Michalk does present two speeches here, the first in German, the second (with interruptions and questions in between) in parts in Sorbian.
speeches <- corpus("GERMAPARL") %>%
subset(date == "2004-06-17") %>%
subset(speaker == "Maria Michalk") %>%
as.speeches(s_attribute_name = "speaker")
I fully agree, there are two distinct speeches. However, if you look at the second one (in Sorbian), something is wrong with the html output. This is a polmineR issue rather than a GermaParl issue.
library(polmineR)
speeches <- corpus("GERMAPARL") %>%
subset(date == "2004-06-17") %>%
subset(speaker == "Maria Michalk") %>%
as.speeches(s_attribute_name = "speaker")
html(speeches[[1]])
html(speeches[[2]])
I also noticed these odd tags when doing read(speeches[[1]])
, yes.
There is an unrecognized speech given in Sorbian, see this snippet: