esciencecenter-digital-skills / geospatial-python

Introduction to Geospatial Raster and Vector Data with Python
https://esciencecenter-digital-skills.github.io/geospatial-python/
Other
3 stars 0 forks source link

Download speed for new shapefile datasets from pdok is too slow #61

Closed rbavery closed 1 year ago

rbavery commented 1 year ago

When downloading the datasets with the curl command suggested in the episode setup, it takes over a minute to get to 4% progress (I'm on the US west coast).

 curl -L --progress-bar \
 --output brpgewaspercelen_definitief_2020.gpkg "https://service.pdok.nl/rvo/brpgewaspercelen/atom/v1_0/downloads/brpgewaspercelen_definitief_2020.gpkg" \
 --output brogmwvolledigeset.zip "https://service.pdok.nl/bzk/brogmwvolledigeset/atom/v2_1/downloads/brogmwvolledigeset.zip" \
 --output status_vaarweg.zip "https://geo.rijkswaterstaat.nl/services/ogc/gdr/vaarweginformatie/ows?service=WFS&version=2.0.0&request=GetFeature&typeName=status_vaarweg&outputFormat=SHAPE-ZIP"

from https://esciencecenter-digital-skills.github.io/geospatial-python/setup.html

I think we need to find a solution for this. Ideally, download of these data would only take at most a minute. Some solutions:

I'm not sure what the cost implications of these different options are. Before when I used figshare for the small raster datasets, I think we were around 300Mb hosted in a single location and it took about a minute to download on the east coast and west coast.

What do you think @fnattino @rogerkuou ?

also it looks like this file is the culprit, it's half a Gb: brpgewaspercelen_definitief_2020.gpkg

rbavery commented 1 year ago

I made a figshare for the vector datasets and a smaller version of the problem file.

This now downloads in about a minute for me. What about for folks in the netherlands? @rogerkuou @fnattino ?

https://figshare.com/ndownloader/files/37729413

https://figshare.com/articles/dataset/Vector_datasets_for_workshop_Introduction_to_Geospatial_Raster_and_Vector_Data_with_Python_/21273837

Should we go ahead and use the fighsare instead of pdok for the setup instructions and learner's would download the dataset from figshare instead? I haven't checked out the pdok license info yet but I assume we can distribute these on figshare instead.

raar1 commented 1 year ago

I had a look at the metadata for the crop dataset here and it seems to list three licenses: http://creativecommons.org/publicdomain/mark/1.0/deed.nl http://inspire.ec.europa.eu/metadata-codelist/ConditionsApplyingToAccessAndUse/noConditionsApply http://inspire.ec.europa.eu/metadata-codelist/LimitationsOnPublicAccess/noLimitations

They certainly look like hosting it on figshare would be fine, but perhaps someone else knows better?

raar1 commented 1 year ago

For example

Van dit werk is vastgesteld dat er geen bekende auteursrechtelijke beperkingen op rusten, alle aanverwante en naburige rechten daarbij inbegrepen. Je mag het werk zonder toestemming kopiëren, wijzigen, verspreiden,en uitvoeren, zelfs voor commerciële doeleinden.

Translates to:

This work has been determined to have no known copyright restrictions, including all related and neighboring rights. You may copy, modify, distribute, and perform the work without permission, even for commercial purposes.

fnattino commented 1 year ago

I see @rbavery, indeed taking it so long to download is not really acceptable. Downloading the dataset you have created from figshare takes less than a minute here as well, so we can go ahead and use this as source of the vector data (and really thanks a lot for checking the licenses @raar1!). Do you agree @rogerkuou ?

rogerkuou commented 1 year ago

We will use the updated data on Figshare from now on.