datopian / ckanext-blob-storage

CKAN extension to offload blob storage to cloud storage providers (S3, GCS, Azure etc).
http://tech.datopian.com/blob-storage/
MIT License
14 stars 6 forks source link

Discussion re extending download to support multiple blob files per resource #39

Open rufuspollock opened 3 years ago

rufuspollock commented 3 years ago
# bit more complex as have verify and multipart
def get_upload_url_and_headers(file_id=sha256, file_size, org/dataset, my-identity) => 
  {
    upload_url,
    headers
  }

def get_download_url_and_headers(file_id=file_sha256, file_size, org/dataset, my-identity) => {download_url}

Storage bucket:

{bucket}/{configured prefix}/{lfs_prefix := org/dataset}/{file-path usually sha256}

Uploading files:

Downloading files ...

How CKAN (ckanext-blob-storage) works (?):

Questions:

Proposal

Proposal B:

Modify download handler to be a bit more generic:

{ckan-instance}/dataset/65050ec0-5abd-48ce-989d-defc08ed837e/resource/26f3d260-9b90-40c8-90de-c540704f59ac/download/sha256:{sha256::size}

  => 404, 401 or 302 to download location

Proposal A: download_url API - rejected because we don't want an API ...

resource:

{
  // current
  url: {ckan-instance}/dataset/65050ec0-5abd-48ce-989d-defc08ed837e/resource/26f3d260-9b90-40c8-90de-c540704f59ac/download/{file-name}
  // new
  url: {ckan-instance}/api/3/action/download_url?sha256=...,size=...,dataset=...,resource=...
  zip_url: {ckan-instance}/api/3/action/download_url?...
}

What I want ...

is the ability to store more than one piece of blob data for a resource and get download urls for that

shevron commented 3 years ago

I think the "multiple files per resource" think is, to the best of my knowledge, not something that is cleanly supported by CKAN. I realize we have some customisations in other projects that allow this, but I'm wondering if is something that should be generically supported by this extension.

I think option B is easy to implement and if we want to support it generically, could be the way to go.

Another option I can suggestion, which is what I consider a cleaner variation of option B, is this:

This will allow ckanext-blob-storage to remain clean and not have code for these kind of special cases.

rufuspollock commented 3 years ago

@shevron agree with you re Option B being the way to go. Also agree that i think it can be changed pretty easily.

I do think it would be worth generifying the default endpoint right now but that's something we can discuss.

How much work do you think it is to do this?

shevron commented 3 years ago

Probably ~1 day but hard to fully estimate without spending some time on analysis. I don't think there is complexity here, just figuring out the right API.

shevron commented 3 years ago

To update, in the latest merge I actually refactored the action that provides download URL / headers to wrap a factored-out Python function (https://github.com/datopian/ckanext-blob-storage/blob/master/ckanext/external_storage/actions.py#L28) that now allows specifying a different sha256 / size / filename for the given resource; This change will allow this extension or other extensions to easily implement the approach I suggested.