ausecocloud / ecocloud

Issue tracker
6 stars 0 forks source link

Make snippets content/file type aware #84

Open sarahrichmond opened 5 years ago

sarahrichmond commented 5 years ago

The snippets are a bit hit and miss. We need to find a way to make these more dynamic so they are aware of what file type they are downloading.

i.e. the snippets fall over when downloading a .zip file, which is unfortunately how a lot of files like shape files come.

This issue is for keeping track of conversations, and for sharing information from the KN team on what info we can get, and therefore how we might create a catalogue of snippets.

Our current implementation of snippets is copied below as a discussion starter: Python:

# Publisher: Department of Sustainability and Environment
# Contact point: data.gov@finance.gov.au
# License: Creative Commons Attribution 3.0 Australia
# Full page: https://data.gov.au/dataset/755f2f61-b9fc-46e8-84d0-2e32ac448e8a 

import urllib.request
url = 'http://data.gov.au/storage/f/2013-05-12T210557/tmpgme16Yrecreational-fishing-spots.csv'
filename = 'tmpgme16Yrecreational-fishing-spots.csv'
urllib.request.urlretrieve(url, filename)

R

# Publisher: Department of Sustainability and Environment
# Contact point: data.gov@finance.gov.au
# License: Creative Commons Attribution 3.0 Australia
# Full page: https://data.gov.au/dataset/755f2f61-b9fc-46e8-84d0-2e32ac448e8a 

url <- "http://data.gov.au/storage/f/2013-05-12T210557/tmpgme16Yrecreational-fishing-spots.csv"
filename <- "tmpgme16Yrecreational-fishing-spots.csv"
download.file(url, destfile=filename)
jyucsiro commented 5 years ago

Hi @sarahrichmond - @jevy-wangfei and I have been looking into this. In KN v2.0, which is used by the current ecocloud on prod, we don't have a field to check what format the resource listed in a dataset actually is. So the data provider can claim that the file type is "shapefile" when it is actually a zipfile.

Let's use the "2016 SoE Biodiversity NUmber of ALA records in 2012" dataset as an example. Here's what it looks like in prod ecocloud explorer: image

Format from the data source metadata shows it's a "esri shapefile..." when it is actually a zipfile.

In the upcoming KN v2.1, we've implemented the "MAGDA format minion" which goes and checks the file format with some level of confidence. In the same example above, but in the dev/test ecocloud explorer (which points to our staging-dev KN instance running v2.1), it looks like this:

image

Format in that entry is "ZIP", which uses the field enriched in KN from the format minion (the source metadata still says it's "esri shapefile..."). So this should be available when we upgrade KN prod to the v2.1 release.

Jevy and I wondered whether it is worth displaying both and letting the user have that info?

gweis commented 5 years ago

As a side note ….

Shapefiles should always be zip files as well.

On 9 Apr 2019, at 11:46 am, Jonathan Yu notifications@github.com wrote:

Hi @sarahrichmond https://github.com/sarahrichmond - @jevy-wangfei https://github.com/jevy-wangfei and I have been looking into this. In KN v2.0, which is used by the current ecocloud on prod, we don't have a field to check what format the resource listed in a dataset actually is. So the data provider can claim that the file type is "shapefile" when it is actually a zipfile.

Let's use the "2016 SoE Biodiversity NUmber of ALA records in 2012" dataset as an example. Here's what it looks like in prod ecocloud explorer: https://user-images.githubusercontent.com/4723726/55767468-6799e080-5abc-11e9-9a8e-caa46993d01f.png Format from the data source metadata shows it's a "esri shapefile..." when it is actually a zipfile.

In the upcoming KN v2.1, we've implemented the "MAGDA format minion" which goes and checks the file format with some level of confidence. In the same example above, but in the dev/test ecocloud explorer (which points to our staging-dev KN instance running v2.1), it looks like this:

https://user-images.githubusercontent.com/4723726/55767562-bba4c500-5abc-11e9-9a1f-b2950f3fdc5d.png Format in that entry is "ZIP", which uses the field enriched in KN from the format minion (the source metadata still says it's "esri shapefile..."). So this should be available when we upgrade KN prod to the v2.1 release.

Jevy and I wondered whether it is worth displaying both and letting the user have that info?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ausecocloud/ecocloud/issues/84#issuecomment-481069586, or mute the thread https://github.com/notifications/unsubscribe-auth/AAIev9wuzVBbV006Hp0yZ37b5xydBZoQks5ve_D-gaJpZM4cb7M8.