cynkra / munch

Functions for working with the historicized list of communes of Switzerland.
https://munch.cynkra.com
6 stars 2 forks source link

Get mutations via API #20

Open krlmlr opened 3 years ago

krlmlr commented 3 years ago

https://www.bfs.admin.ch/bfs/de/home/dienstleistungen/forschung/api/api-gemeinde.assetdetail.15224054.html

Also check how zazuko is doing it. Old way: https://github.com/zazuko/fso-lod/commit/e8f08dee258697b77e9091c6f601be6675518639, maybe this has improved by now?

krlmlr commented 3 years ago

Example query: https://sms.bfs.admin.ch/WcfBFSSpecificService.svc/AnonymousRest/communes/mutations?startPeriod=01-01-1960&endPeriod=01-01-2021&includeTerritoryExchange=true

ThomasKnecht commented 3 years ago

@krlmlr -> see branch: https://github.com/cynkra/SwissCommunes/tree/f-20-download-with-api

We can download the data with the api, but some variables are missing such as: mAbolitionDate as well as the mAdmissionDate.

Furthermore the download takes a long time and it returns everything as a single character string.

I added a method which is also used by swissdata. It uses the BFS_NR of the file to scrape the asset number which allows us to download the same files structure as we had before.

I prefer this method, since we get them in the same structure and all the variables are present. Currently the code is just scripted and not yet in a function.

What is your opinion about the download method?

krlmlr commented 3 years ago

Thanks. If we can reliably get the asset ID I'm fine with that approach. Basically, I'm fine with anything we can stuff into a GitHub Action ;-)

data-raw/update-data.R has most of the updating code, no need to duplicate it. Can you please add detection of the asset ID to that file in a separate PR? I'll take a closer look at this PR later; the API might be producing something that resembles the format we're computing with swc_get_mutations() . This means that we may be able to get rid of even more code.

ThomasKnecht commented 3 years ago

You are right, the api data has similar structure to our data after saw_get_mutations(), but only the mutation date is given. But maybe it is useful like that. Wee need to check the downstream functions to check if we can work with this format.

krlmlr commented 3 years ago

A useful next step would be to write the result of swc_get_mutations() to a CSV file and to daff it with the results of the API (after column selection and renaming).

krlmlr commented 2 years ago

We now have overwrite_data(), perhaps it does the job?

TSchiefer commented 2 years ago

overwrite_data() calls swc_read_data() which in the end gets the current mutation data, but not from the API, but currently e.g. from https://www.bfs.admin.ch/bfsstatic/dam/assets/17884689/master (the asset number 17884689 is taken from scraping a permanent link) The mutations are then written to the csv-files.

swc_get_mutations() reads the csv-file(s).

I don't think the current approach of overwrite_data() is so bad. If we want to use the API though, I will have a look into https://github.com/cynkra/munch/tree/f-20-download-with-api to check once more if we can maybe use the API to be more flexible.

salim-b commented 2 years ago

Hey there, I just randomly stumbled upon this repo. If you'd like to work with the API mentioned above: I've written an R package (AGPL-3+) named swissmuni a while ago which allows to access all endpoints and offers as much documentation as I could put together (I've updated the doc just now since I only now learned about the PDF available here which wasn't around two years ago).

As you already noticed, the API is very slow. Thus I've added a caching mechanism (building upon pkgpins). Be aware that the caching might not work as intended on Windows (cache lost after R session restart I think), I still have to investigate that. Besides that, everything else should be stable.

I never got around to publish the package on CRAN, but you can simply install it using remotes::install_gitlab("salim_b/r/pkgs/swissmuni").

krlmlr commented 2 years ago

Thanks! The API seems to deliver data that looks slightly different from what we are processing internally. It would be great to have a closer look to understand the differences and perhaps consolidate.