Open krlmlr opened 3 years ago
@krlmlr -> see branch: https://github.com/cynkra/SwissCommunes/tree/f-20-download-with-api
We can download the data with the api, but some variables are missing such as: mAbolitionDate as well as the mAdmissionDate.
Furthermore the download takes a long time and it returns everything as a single character string.
I added a method which is also used by swissdata. It uses the BFS_NR of the file to scrape the asset number which allows us to download the same files structure as we had before.
I prefer this method, since we get them in the same structure and all the variables are present. Currently the code is just scripted and not yet in a function.
What is your opinion about the download method?
Thanks. If we can reliably get the asset ID I'm fine with that approach. Basically, I'm fine with anything we can stuff into a GitHub Action ;-)
data-raw/update-data.R
has most of the updating code, no need to duplicate it. Can you please add detection of the asset ID to that file in a separate PR? I'll take a closer look at this PR later; the API might be producing something that resembles the format we're computing with swc_get_mutations()
. This means that we may be able to get rid of even more code.
You are right, the api data has similar structure to our data after saw_get_mutations(), but only the mutation date is given. But maybe it is useful like that. Wee need to check the downstream functions to check if we can work with this format.
A useful next step would be to write the result of swc_get_mutations()
to a CSV file and to daff it with the results of the API (after column selection and renaming).
We now have overwrite_data()
, perhaps it does the job?
overwrite_data()
calls swc_read_data()
which in the end gets the current mutation data, but not from the API, but currently e.g. from https://www.bfs.admin.ch/bfsstatic/dam/assets/17884689/master (the asset number 17884689
is taken from scraping a permanent link)
The mutations are then written to the csv-files.
swc_get_mutations()
reads the csv-file(s).
I don't think the current approach of overwrite_data()
is so bad.
If we want to use the API though, I will have a look into https://github.com/cynkra/munch/tree/f-20-download-with-api to check once more if we can maybe use the API to be more flexible.
Hey there, I just randomly stumbled upon this repo. If you'd like to work with the API mentioned above: I've written an R package (AGPL-3+) named swissmuni a while ago which allows to access all endpoints and offers as much documentation as I could put together (I've updated the doc just now since I only now learned about the PDF available here which wasn't around two years ago).
As you already noticed, the API is very slow. Thus I've added a caching mechanism (building upon pkgpins). Be aware that the caching might not work as intended on Windows (cache lost after R session restart I think), I still have to investigate that. Besides that, everything else should be stable.
I never got around to publish the package on CRAN, but you can simply install it using remotes::install_gitlab("salim_b/r/pkgs/swissmuni")
.
Thanks! The API seems to deliver data that looks slightly different from what we are processing internally. It would be great to have a closer look to understand the differences and perhaps consolidate.
https://www.bfs.admin.ch/bfs/de/home/dienstleistungen/forschung/api/api-gemeinde.assetdetail.15224054.html
Also check how zazuko is doing it. Old way: https://github.com/zazuko/fso-lod/commit/e8f08dee258697b77e9091c6f601be6675518639, maybe this has improved by now?