Closed franloza closed 1 month ago
Woah @franloza, this is awesome! I was actually looking for this kind of data a while back.
Thanks so much for taking the time to write such a great PR description.
I wrote the pipeline to be "friendly" with the API and divided the requests in chunks of 1 year, although is possible to extract all the data by passing 1=1 in the where clause (Not very nice to do).
Nice! I think that should be ok, although API stability varies a lot in these sort of projects (I was just dealing with issues around the AEMET API as it seems be down or erroring and has been like that for the last few days).
I'm not sure whether this is a misconfiguration on the server, or they really allowed these resources to be publicly accessed
I think these must be public but not 100% sure! :see_no_evil:
Dataset now on HuggingFace!
Also did a silly query-as-avg-from-'https%3A%2F%2Fhuggingface.co%2Fdatasets%2Fdavidgasquez%2Fspain_water_reservoirs_data%2Fresolve%2Fmain%2Fdata%2Fmain%2000000%20of%2000001.parquet'-group-by-1-order-by-1-desc-limit-10~) to see that everything worked.
Thanks again! Will spend more time with this dataset and perhaps add a dbt
model to smooth those variable names.
Anytime! Happy to contribute 😃
NOTE: TIL about DuckDB Shell. What a nice too to share queries on online datasets 💯
Why?
I think that adding this dataset to the repository is interesting to allow citizens and researchers to have easy access to information about water availability, usage, and management practices.
What?
Currently, the data about water reservoirs is open via Boletín Hidrológico Nacional, provided by MITECO (Ministerio para la Transición Ecológica y el Reto Demográfico).
However, up-to-date data are not available to download. You have these options to consume these data:
How?
Investigating a little about the second option (ArcGIS Dashboard), I discovered an endpoint that returned data in JSON format when passing a series of parameters:
GET https://services-eu1.arcgis.com/RvnYk1PBUJ9rrAuT/arcgis/rest/services/Embalses_Mapa/FeatureServer/0/query
. With this endpoint, I could extract all the data from this dataset. To know the parameters accepted by this endpoint, I used the following form.I wrote the pipeline to be "friendly" with the API and divided the requests in chunks of 1 year, although is possible to extract all the data by passing
1=1
in thewhere
clause (Not very nice to do).In addition, I configured
MITECOArcGisAPI
resource to accept a dataset name as it's possible to access other datasets in the ArcGIS server. The complete list can be found here. This server allows to enter to any dataset, take a look at the data and visualise the dataset in a map.Here's for example the map of water reservoirs in Spain with the most recent data:
I'm not sure whether this is a misconfiguration on the server, or they really allowed these resources to be publicly accessed 😅
Testing the data