Closed BernhardAuer closed 10 months ago
name is missing ... https://parli-info.org/wortmeldung/XXVII/195/Unternehmens-Energiekostenzuschussgesetz%20%E2%80%93%20UEZG/Erwin%20Angerer
--> this is OK because name is missing in original source doc!
https://parli-info.org/wortmeldung/XXVII/156/Sozialhilfe-Grundsatzgesetz%20und%20Sozialhilfe-Statistikgesetz/Markus%20Koza check this...
nrOfSpeechByThisPerson is wrong ....
parsing of names with Mag [.] / MMag [.] / Ing (potential all titles) are not working mes, diplome, ...
parsing of embedded images does not work: https://parli-info.org/wortmeldung/XXVII/168/Gesundheits-%20und%20Krankenpflegegesetz%20(GuKG-Novelle%202022)/August%20W%C3%B6ginger
XXVII 10,12 no doc links available!!!
could not parse speechUrl, because there are name inconsistences .....
not finished speeches: speechUrl gets parsed for wrong entity ....
🚀 --> fixed
okay, let's do it this way for now: -we don't know if there is any interruption by just looking at the official time plan. But that's were we are getting our base data from. So we could just merge all entries from one topic ("kurze debatte") together. If there is a unique name match of a speaker, everything is fine. If not, there is a possibility of duplicates/conflicts etc. so we just don't parse those URLs for now.
Can be fixed later. I think it's a rather rare problem
btw: these are duplicate "typetext" topics: https://parli-info.org/wortmeldungen/XXVII/158
HTML vs Text parsing considerations
[x] fix time parsing (time with seconds is not working)
[x] do not parse (Entschließungs-) anträge etc. as speeches
[x] do not parse name titles
[x] fix missing speeches if name titles are wrong
[x] fix missing speeches if topNr is missing
(see also #47)