clarity-h2020 / ckan

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers datahub.io, catalog.data.gov and europeandataportal.eu/data/en/dataset among many other sites.
https://ckan.myclimateservice.eu/
Other
0 stars 0 forks source link

Publish Hazard Local Effects Input Layers as OpenData #31

Open p-a-s-c-a-l opened 4 years ago

p-a-s-c-a-l commented 4 years ago

Publish Hazard Local Effects Input Layers as OpenData on Zenodo and update the CKAN Catalogue datasets.

p-a-s-c-a-l commented 4 years ago

ATM just 20 of 544 cities have been calculated. Can you estimate how long it will take to calculate the remaining cities? @negroscuro

For the DMP, perhaps we should upload what we have now to Zenodo and make an update when all cities are available. I've downloaded built_open_spaces and medium_urban_fabric Shapefile from here, but the zip files are only a few KB in size, is this realistic?

BTW, can you please check if the list and description of local effects datasets in CKAN is still complete?

negroscuro commented 4 years ago

There is a discussion regarding data generation for cities at: https://github.com/clarity-h2020/data-package/issues/59

I am afraid I cannot estimate any feasible / reasonable time for that. Before adding new data ESM20 to vegetation layer it could take around 5 weeks, now I really do not know... but much more than that in theory. I have been warning everyone about this more than a month ago. Indeed current 20 cities are not updated with new vegetation(ESM20) data.

p-a-s-c-a-l commented 4 years ago

O.K. And what about my 2nd question:

I've downloaded built_open_spaces and medium_urban_fabric Shapefile from here, but the zip files are only a few KB in size, is this realistic?

negroscuro commented 4 years ago

Sure, sorry.

Regarding data size realism, it is not, I just downloaded whole Europe Agricultural areas: http://services.clarity-h2020.eu:8080/geoserver/clarity/ows?service=WFS&version=1.0.0&request=GetFeature&typeName=clarity%3Aagricultural_areas&outputFormat=SHAPE-ZIP It takes 235Mb as compressed file. See that maxFeatures=50 is limiting what you are getting from the server, that parameter has to be removed in order to get all data in a specific layer request...

I am trying to find how to download a compressed geojson file which can be less heavy than a shapefile in order to upload to Zenodo in an easier way...

Regarding CKAN data is still complete, I updated it a couple of months ago to add latest added layer which is sports.

negroscuro commented 4 years ago

I manage to download json by using curl from ubuntu console:

curl -u admin:XXXXXXXX -XGET "http://services.clarity-h2020.eu:8080/geoserver/clarity/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=clarity:agricultural_areas&maxFeatures=50&outputFormat=application/json" > agricultural_areas.json

It only takes 176Kb. Do you want me to provide current layer contents by exporting to such format every layer in the geoserver for local effects and have something to upload to Zenodo?

negroscuro commented 4 years ago

I checked a couple of local effects datasets in order to see if WFS links are set to download a shapefile zip and they are. Even without the maxFeatures=50 limitation, so I would say data is correctly referenced from CKAN. Of course URL's have to be updated once migration of Geoserver is done.

The bad point is that trying to make such link to be a json download ... I do not know if that would work since in my cas the webBrowser tries to open it and it produced a memory leak in the web Broser... that is why I tested with CURL command.

negroscuro commented 4 years ago

Last test with Reggio_di_Calabria a city with just 2061 cells uses to last around 25minutes but with ESM20 even after the CPU parallel query enhancement is taking around 32hours so I stopped it...

p-a-s-c-a-l commented 4 years ago

TODO @p-a-s-c-a-l : Deposit data in Zenodo

I noticed there is a dataset that is missing in CKAN (I just created it) but I need the data reference to be set in Zenodo, could you please handle that? https://ckan.myclimateservice.eu/dataset/land-use-grid

Mortality is also missing Zenodo: https://ckan.myclimateservice.eu/dataset/mortality

Cities as well is missing: https://ckan.myclimateservice.eu/dataset/cities Basins: https://ckan.myclimateservice.eu/dataset/basins And Streams: https://ckan.myclimateservice.eu/dataset/streams

p-a-s-c-a-l commented 4 years ago

@DanielRodera

When trying to download e.g. roads.shp.zip, GeoServer responds with

502 Bad Gateway nginx/1.17.6

DanielRodera commented 4 years ago

Hi @p-a-s-c-a-l, as I see you are trying to download the entire layer which contains all the roads in Europe. The Geoserver is not able to compress all the data for this layer in a reasonable time, that's why is throwing the error. If you try adding the "&maxFeatures=" to the request, will work.

p-a-s-c-a-l commented 4 years ago

Hi @p-a-s-c-a-l, as I see you are trying to download the entire layer which contains all the roads in Europe. The Geoserver is not able to compress all the data for this layer in a reasonable time, that's why is throwing the error. If you try adding the "&maxFeatures=" to the request, will work.

Yes, but that's what I need, the complete data.

maesbri commented 4 years ago

Hi @p-a-s-c-a-l, as I see you are trying to download the entire layer which contains all the roads in Europe. The Geoserver is not able to compress all the data for this layer in a reasonable time, that's why is throwing the error. If you try adding the "&maxFeatures=" to the request, will work.

Yes, but that's what I need, the complete data.

I would then propose to export directly from the database that layer (and probably others) into a compressed zip and publish it in a web server (or maybe Zenodo) and use that as link for the catalogue. Geoserver (and more specifically, WFS and WCS services) never were meant for downloading large amounts of data as it is not an efficient mean (for that purpose is better to serve it via ftp or http as a zip).

p-a-s-c-a-l commented 4 years ago

Thanks @maesbri We should do that as soon as all cities have been calculated.

According to H2020 open data obligations we have assure long term preservation of the data, so the best option is to update the existing local effects datasets available in Zenodo.

p-a-s-c-a-l commented 4 years ago

@DanielRodera Is the HC-LE calculation complete? Is the data available somewhere for downloading?