TkTech / ckanext-cloudstorage

Implements support for resource storage against multiple popular providers via apache-libcloud (S3, Azure Storage, etc...)
MIT License
35 stars 55 forks source link

Add generic uploader #25

Open gerbyzation opened 6 years ago

gerbyzation commented 6 years ago

As promised in #23 (and #2), here is my work so far on adding support for generic file uploads. It's a little messy, but getting there

I still want to do some more work on the tests. Still figuring out how to properly do testing in CKAN. Initially I borrowed the testing approach from s3filestore, but they have a fake backend so need to work out if those tests are actually of any use when mocked out. I also want to write some more tests to make sure certain parts behave as expected. Everything (but especially the tests) need an overall clean as well.

And sorry for the somewhat polluted commit history, was trying to get a docker setup going to make things easy, ended up with https://github.com/okfn/docker-ckan which is quick to setup.

If there's any feedback please let me know!

TkTech commented 6 years ago

Looking good, I'll give it a test this weekend. Do you know if there are any open-source implementations of Google's API that can be used for testing, such as Minio for S3?

gerbyzation commented 6 years ago

Not that I'm aware, but the google storage libcloud driver is using the S3 compatible API (see https://libcloud.readthedocs.io/en/latest/storage/drivers/google_storage.html). Not exactly sure what version of the S3 api that would be, but I would guess this might work with minio as long as you disable secure urls.

Bit of a shame I'm only finding out about minio now, looks amazing and could probably have saved me some work!

gerbyzation commented 6 years ago

I forgot to mention how to authenticate with GCP, which I need to write up for the docs as well. For the driver options, the key is the service account ID with access to the bucket, secret is the path to the JSON key of this service account. It also needs a third option, project, which is the google project ID.

gerbyzation commented 6 years ago

@TkTech did you get a chance to test this?

gerbyzation commented 6 years ago

I've pushed some updates. Most noteworthy is the fact that the use_secure_urls is being ignored for generic files. I don't think there's much point in making these files private, especially as it is bad for performance as the browser won't be able to cache any assets.

Also I've fixed some tests that weren't working.

Would be great if you get a chance to have a look and let me know what you think.

TkTech commented 6 years ago

Currently everywhere a generic resource can be uploaded, it can be retrieved without being authenticated. From a security perspective I'm okay with this. However, many buckets that are using secure URLs entirely disable non-secure access to the bucket (versus a specific path using the advanced ACL rules). It would be better to have use_secure_urls_for_generics (?) as an option instead.

Please remove the cover report, we don't want that in git.

Otherwise, definitely looking good! 👍

gerbyzation commented 6 years ago

@TkTech Thanks. I've added the option. So now use_secure_urls only applies to resource uploads, and use_secure_urls_for_generics to the generic files. The first setting could do with renaming for clarification, but that would of course break backwards compatibility.

In the last few commits I've implemented a custom 301 redirect that is cacheable for public generic files, to prevent asset reloading on every new page.

Another thing worth mentioning is google's ACL. Google Cloud Storage doesn't do folder level permissions, so when the resource files are private and generics public, I'm setting the ACL on generics on upload.

I'm going to deploy this on some of my CKAN instances to test the plugin. Given I don't find any bugs I'm pretty much finished with this PR. Only other thing left to do is update documentation.

jqnatividad commented 5 years ago

Hi @gerbyzation , @TkTech , cc @akariv Was just wondering where this PR stands. Thanks!

TkTech commented 5 years ago

It's usable with a small conflict, if you need it right away @gerbyzation's work is 👍

v2 of cloudstorage is coming in the next few days as part of work funded by TBS open-data, and is a complete rewrite.

gerbyzation commented 5 years ago

@jqnatividad Been running my version with the generic uploader for static files for a year on a bunch of instances now, haven't had any issues.

amercader commented 5 years ago

@TkTech any update on the new cloudstorage version? We can definitely help test or work on missing pieces.