edwardspec / mediawiki-aws-s3

Extension:AWS allows MediaWiki to use Amazon S3 (instead of the local directory) to store images.
https://www.mediawiki.org/wiki/Extension:AWS
GNU General Public License v2.0
42 stars 32 forks source link

cloudfront CDN feature #5

Closed cariaso closed 6 years ago

cariaso commented 6 years ago

I've previously used (and even maintained) forks of https://github.com/francisli/LocalS3Repo2 which has become abandonware. It had one nice feature which doesn't seem to be in AWS, support for cloudfront in front of the s3 bucket. I recently switched to this AWS extension, and happened to reuse the same AWS bucket name. After a fairly trivial reupload, I've been delighted to find that not only is my previous version history preserved, but the previous cloudfront is still in effect, and just started working with no effort or changes.

So this is definitely not a bug report. Mostly it's just kudos. But it looks like you might be able to cannibalize a few extra features from that module, if there is any interest.

You might also want to leave a comment at https://www.mediawiki.org/wiki/Extension:LocalS3Repo or it's talk page, so that people from that module community become aware of your work.

However, none of the above needs to be done, so you can feel free to close this 'issue' immediately.

edwardspec commented 6 years ago

I have a feeling that just setting 'url' for public/thumb zones in $wgLocalFileRepo would be enough.

You'd need something like [tests/travis/OldStyleAWSSettings.php] in your LocalSettings.php (instead of $wgAWSBucketPrefix, which, if selected, redefines $wgLocalFileRepo).

edwardspec commented 6 years ago

CloudFront cache would also need to be invalidated when the image is deleted/reuploaded. I'll investigate how to integrate this with Extension:AWS.

cariaso commented 6 years ago

I think 'need' is rather strong here. It'd naturally happen after 24h, and given the general stability of a mediawiki install, I don't see any need for anything beyond that. If the user wants to ensure realtime updates, they use the s3 urls. If they want cdn caching, they get cdn 24h lag.

edwardspec commented 6 years ago

The need to invalidate cache becomes very pressing when the image is deleted for privacy reasons (someone mistakenly uploaded a secret document) or by a legal request. Besides, I'll get a tonn of bugreports like "I uploaded an image and it didn't show up" if we don't invalidate it.

edwardspec commented 6 years ago

Configuration variable $wgAWSBucketDomain (default: s3.amazonaws.com) was added:

// To use domains like <bucket-name>.cloudfront.net for public URLs
$wgAWSBucketDomain = "cloudfront.net";
edwardspec commented 6 years ago

As for cache invalidations, it's a CDN-specific thing and shouldn't be in this extension. (all CDNs do this differently)

If someone decides to write such an extension for CloudFront, they can use UploadComplete and ArticleDeleteComplete hooks to be informed about image deletions/reuploads.

cariaso commented 6 years ago

due to my use of route53, my cdn urls were actually behind a dns. Here is one such example. http://media.snpedia.com/images/6/66/Screen_shot_2011-06-18_at_1.16.19_AM.png although my actual bucket name is snpedia-media. So perhaps the interface you've chosen is too restrictive?

edwardspec commented 6 years ago

Agreed, I'll make it more customizable.

gboyers commented 6 years ago

This is because of the way AWS S3 handles custom domains directly. However, it looks like you have overridden this with your CDN in front of S3.

When accessing S3 directly, via a DNS CNAME subdomain, S3 interprets the entire domain name as the bucket name.

So for snpedia.com, if your CNAME media points to bucket snpedia-images.s3.amazonaws.com then S3 will look for a bucket called media.snpedia.com. For this to work, you need to use media.snpedia.com as the bucket name, and your CNAME would point to media.snpedia.com.s3.amazonaws.com

This is in the S3 documentation: https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html

To make this work for me, I created a new set of buckets (at files.example.com), and moved everything over using the aws command line: aws s3 cp s3://old-bucket/ s3://new-bucket.example.com --recursive

However for you, it looks like you're rewriting those URLs via CloudFront, so this will be happening behind your CDN. In which case you probably do want to be able to use whatever name you want for the URLs here.

edwardspec commented 6 years ago

Ok, how about the following syntax:


// This will use <bucket-name>.cloudfront.net
$wgAWSBucketDomain = '$1.cloudfront.net';

// Default
$wgAWSBucketDomain = '$1.s3.amazonaws.com';

// This will use media.mysite.com for "public" zone
// and media-thumb.mysite.com for "thumb" zone.
$wgAWSBucketDomain = 'media$2.mysite.com'

// Alternatively, zone URLs can be specified directly:
$wgAWSBucketDomain = [
  'public' => 'media.mysite.com',
  'thumb' => 'thumb.mysite.com'
];
cariaso commented 6 years ago

Seems to cover it all.

On Sun, Jul 8, 2018 at 10:47 PM Edward Chernenko notifications@github.com wrote:

Ok, how about the following syntax:

// This will use .cloudfront.net$wgAWSBucketDomain = '$1.cloudfront.net';// Default$wgAWSBucketDomain = '$1.s3.amazonaws.com';// This will use media.mysite.com for "public" zone// and media-thumb.mysite.com for "thumb" zone.$wgAWSBucketDomain = 'media$2.mysite.com'// Alternatively, zone URLs can be specified directly:$wgAWSBucketDomain = [ 'public' => 'media.mysite.com', 'thumb' => 'thumb.mysite.com'];

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/edwardspec/mediawiki-aws-s3/issues/5#issuecomment-403316014, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHpkq1ehP8LzDxlxmj50Ze73iogcWz3ks5uEm_JgaJpZM4VGfA0 .

--

Mike Cariaso http://www.cariaso.com

edwardspec commented 6 years ago

Implemented in 9c806c095b4f8a2b47e58483df472c0d452795b5.

I also added a testcase to verify that $wgAWSBucketDomain and $wgAWSBucketPrefix are correctly translated into $wgLocalFileRepo and $wgFileBackends.

edwardspec commented 6 years ago

The README from 9e0edbe384fd98f6a482e9f928115511b386d2f3 has a problem though.

1) When you create your four S3 buckets, you must include your full domain in their names, eg: files.example.com, files-thumb.example.com, files-temp.example.com, files-deleted.example.com

This won't work, the bucket names are always "${wgAWSBucketPrefix}", "${wgAWSBucketPrefix}-thumb", "${wgAWSBucketPrefix}-deleted" "${wgAWSBucketPrefix}-temp"

(except when you don't define $wgAWSBucketPrefix at all)

@gboyers Can it be the cause of issue #7? If you set $wgAWSBucketPrefix="files.example.com", then it's trying to save thumbnails into the S3 bucket "files.example.com-thumb".

Do we need the "customized bucket names" feature too?

edwardspec commented 6 years ago

Closing #5 as "implemented". Additional feature request "make $wgAWSBucketPrefix more customizable" will be handled in issue #9.