SORMAS-Foundation / SORMAS-data-generator

SORMAS-data-generator
GNU General Public License v3.0
3 stars 7 forks source link

[R] Download SurvStat data via web service #21

Closed stephaneghozzi closed 3 years ago

stephaneghozzi commented 3 years ago

At the moment there are two data sources: RKI's corona dashboard and RKI's SurvStat. The first is already queried automatically, the second however has to be downloaded manually, although there is a web service https://tools.rki.de/SurvStat/SurvStatWebService.svc, which however is a bit tricky to use. See https://github.com/rgieseke/opencoviddata and in particular https://github.com/rgieseke/opencoviddata/blob/main/scripts/fetch-state.py for a Python implementation.

JonasCir commented 3 years ago

Doing this in python looks quite simple and effective. Both of the tools use the same descriptor file to interact with the service. Before including the code, however, we need to reach out to the owner of the repo as no License is specified for the code.

stephaneghozzi commented 3 years ago

Good point... A bit a question of what is done in R and what in Python... But that doesn't really matter if the code is modular :)

JonasCir commented 3 years ago

I think we could run the python script in the very beginning to store the data as CSV and then spin up R. Also I found a way to super conviniently call R from python rpy2, it is really an amazing piece of software. I already used it for SORMAS-Stats, just in case we need it.

rgieseke commented 3 years ago

Good point on a missing license ... i have also (uncommitted) scripts for Landkreise (#23) ...

rgieseke commented 3 years ago

Just added a BSD license (https://github.com/rgieseke/opencoviddata/commit/2f2824672e9c5e6128c02fc599af8a7a891c2ed9)

stephaneghozzi commented 3 years ago

😮 and I was there writing a long email... Thanks! 🙏

rgieseke commented 3 years ago

Cheers! Pushed the county script as well so you have that too. I think i remember there was someone on Twitter who said they have an (unpublished) R-wrapper for SurvStat but these Python scripts to CSV are probably self-contained enough that you can get them to run if you want (or re-do the API calls in your favourite environment). The state version has run pretty well in the daily GitHub CI action.

JonasCir commented 3 years ago

@rgieseke That's awesome! Thank you so much :)

stephaneghozzi commented 3 years ago

yes... I think I saw the tweet and but couldn't find it afterwards... (Thing is, former colleagues at RKI did develop such solutions, but we could never determine whether it was ok to make them publicly available... a real shame...)

rgieseke commented 3 years ago

Yeah ... i mean technically it is sort of self-documenting (https://tools.rki.de/SurvStat/SurvStatWebService.svc?wsdl) but a lot i had to figure out from looking at the field names in the website ...

stephaneghozzi commented 3 years ago

we'll try and extend it to dimension "Falldefinitionskategorie" and possibly age group and sex, but that shouldn't be a problem given all you've done already

JonasCir commented 3 years ago

If anything is missing, I'm happy to contribute back :D

JonasCir commented 3 years ago

I drop you both a mail :)

JonasCir commented 3 years ago

Thank yo us much :)

JonasCir commented 3 years ago

@rgieseke I sent an email to the email address linked on your GitHub profile

JonasCir commented 3 years ago

Continued in #27