davidread / ckanext-downloadall

CKAN extension that adds a "Download all" button to datasets
GNU Affero General Public License v3.0
5 stars 15 forks source link

Auth When Adding Packages To Zip #23

Open jonathansberry opened 4 years ago

jonathansberry commented 4 years ago

This looks like the perfect extension for us! We're stuggling to get it working though.

Symptoms: We get a lovely zip file that can be downloaded but it only includes the metadata json file. There are no resource files in there.

Diagnosis: I've dived into the code and it looks like the problem occurs in tasks.py download_resource_into_zip function. This function trys to download the resource file and to add to the zip. All our resource files have auth restrictions on them though, causing the download to fail with a 404 Not Found error because the request in line 237 isn't properly authorized.

Thoughts: I'm suprised no one else has struggled with this. Would appriciate hearing your thoughts @davidread on whether I'm am diagnosiing this issue correctly and whether there is an easy fix? The task is run internally by a ckan worker so I'd imagine it's fine just to give the request authorized access to all resources. Is there a way to do this?

davidread commented 4 years ago

I've simply not used it with any datasets which are private and have uploaded resources, hence why the API key is not supplied on requests.

Yes I think this is good to add, so feel free to create a PR. I guess you'll use the site user, so authorized to do anything. I guess you'll need to only supply the api key in a http header to GET the internal CKAN uploaded resources, rather than a resource.url which is open on the internet. Otherwise you expose the key. I think there's a resource.type=='upload' or some other flag you should check for, and perhaps you can also ensure the resource.url.startswith(config.site_url).

Thinking aloud about security: when a user downloads the zip, we need to check they don't get access something they wouldn't otherwise have access to. And when a user tries to download the zip, they will have to pass the same authz checks as downloading any other uploaded resource. Since we don't have per-resource security, just per-package (private), this should be fine (at least unless someone adds per-resource security).

jonathansberry commented 4 years ago

Thanks for this breakdown David. It's really helpful.

I'll have a think about what the best route forwards is here. We are running per resource security in our setup so there are some real further complications to consider. If we come up with a solution in the time available, we'll definitely open a PR.