BernhardAuer / austrian-parliament-data-processing

Website for visualizing austrian's open government data in an appealing, interactive and simple-to-use manner.
https://parli-info.org
1 stars 0 forks source link

fix speech scraping bugs #48

Closed BernhardAuer closed 10 months ago

BernhardAuer commented 11 months ago

(see also #47)

BernhardAuer commented 10 months ago

image name is missing ... https://parli-info.org/wortmeldung/XXVII/195/Unternehmens-Energiekostenzuschussgesetz%20%E2%80%93%20UEZG/Erwin%20Angerer

--> this is OK because name is missing in original source doc!

BernhardAuer commented 10 months ago

image https://parli-info.org/wortmeldung/XXVII/197/Volksbegehren%20%22Stoppt%20Lebendtier-Transportqual%22/Alois%20Kainz

... image

BernhardAuer commented 10 months ago

image image

BernhardAuer commented 10 months ago

https://parli-info.org/wortmeldung/XXVII/156/Sozialhilfe-Grundsatzgesetz%20und%20Sozialhilfe-Statistikgesetz/Markus%20Koza check this... image

image nrOfSpeechByThisPerson is wrong ....

BernhardAuer commented 10 months ago

parsing of names with Mag [.] / MMag [.] / Ing (potential all titles) are not working mes, diplome, ...

BernhardAuer commented 10 months ago

parsing of embedded images does not work: image https://parli-info.org/wortmeldung/XXVII/168/Gesundheits-%20und%20Krankenpflegegesetz%20(GuKG-Novelle%202022)/August%20W%C3%B6ginger

BernhardAuer commented 10 months ago

XXVII 10,12 no doc links available!!!

BernhardAuer commented 10 months ago

image

BernhardAuer commented 10 months ago

could not parse speechUrl, because there are name inconsistences ..... image image

BernhardAuer commented 10 months ago

name changes.... image http://localhost:5173/wortmeldung/XXVII/213/%22Preisstopp%20%E2%80%93%20Steuerstopp%20%E2%80%93%20Sanktionsstopp!%20Wann%20setzt%20die%20Regierung%20endlich%20echte%20Ma%C3%9Fnahmen%20gegen%20die%20Kostenlawine%3F%22/Pia%20Philippa%20Beck

BernhardAuer commented 10 months ago

http://localhost:5173/wortmeldung/XXVII/213/T%C3%A4tigkeitsbericht%202022%20des%20Rechnungshofes%20%E2%80%93%20Reihe%20BUND%202022%2F44%3B%20Bericht%20d.%20RH%20-%20Reihe%20Bund%202022%2F22/Margit%20Kraker image

BernhardAuer commented 10 months ago

not finished speeches: speechUrl gets parsed for wrong entity .... image

BernhardAuer commented 10 months ago

🚀 --> fixed

BernhardAuer commented 10 months ago

http://localhost:5173/wortmeldung/XXVII/156/Studienf%C3%B6rderungsgesetz%201992/Martina%20Kaufmann

BernhardAuer commented 10 months ago

http://localhost:5173/wortmeldung/XXVII/156/Studienf%C3%B6rderungsgesetz%201992/Martina%20Kaufmann image

BernhardAuer commented 10 months ago

image http://localhost:5173/wortmeldung/XXVII/156/Erkl%C3%A4rungen%20des%20Bundeskanzlers%20und%20des%20Vizekanzlers%20gem%C3%A4%C3%9F%20%C2%A7%2019%20Abs.%202%20der%20Gesch%C3%A4ftsordnung%20des%20Nationalrates%20anl%C3%A4sslich%20der%20Ernennung%20von%20neuen%20Mitgliedern%20der%20Bundesregierung%20sowie%20einer%20Staatssekret%C3%A4rin%20und%20eines%20Staatssekret%C3%A4rs/Beate%20Meinl-Reisinger

BernhardAuer commented 10 months ago

image

BernhardAuer commented 10 months ago

image http://localhost:5173/wortmeldungen/XXVII/213?thema=%22Preisstopp+%E2%80%93+Steuerstopp+%E2%80%93+Sanktionsstopp%21+Wann+setzt+die+Regierung+endlich+echte+Ma%C3%9Fnahmen+gegen+die+Kostenlawine%3F%22

sitzungsunterbrechung ....

BernhardAuer commented 10 months ago

okay, let's do it this way for now: -we don't know if there is any interruption by just looking at the official time plan. But that's were we are getting our base data from. So we could just merge all entries from one topic ("kurze debatte") together. If there is a unique name match of a speaker, everything is fine. If not, there is a possibility of duplicates/conflicts etc. so we just don't parse those URLs for now.

Can be fixed later. I think it's a rather rare problem

BernhardAuer commented 10 months ago

btw: these are duplicate "typetext" topics: https://parli-info.org/wortmeldungen/XXVII/158