andrusha / middleman-cloudfront

An Amazon CloudFront extension for middleman which allows you to invalidate CloudFront CDN cache
MIT License
58 stars 29 forks source link

Only invalidate updated files #10

Closed tmaier closed 10 years ago

tmaier commented 10 years ago

middleman build logs the following

       identical  build/opensearch.xml
       update  build/blog/2013/04/01/index.html
       update  build/blog/2013/03/index.html
       update  build/impressum/index.html

It would be great, if middleman-cloudfront would only invalidate the files market with update

manuelmeurer commented 10 years ago

Yeah, good idea. I had a look but couldn't figure out how to get the list of new and updated files in the hook that calls middleman-cloudfront. I posted the question on the middleman forums here: http://forum.middlemanapp.com/t/get-a-list-of-created-and-updated-files-in-extension/1368 Let's see if somebody comes up with an idea. :smile:

AndrewKvalheim commented 10 years ago

I often build several times before deploying, so a simple build hook wouldn't be sufficient for my workflow. Adding invalidation support to Middleman::S3Sync has been on my mind, but there's also enough information in S3+CloudFront's HEAD responses for this extension to work independently:

$ md5sum build/index.html*
fe7000c853a03cfb971cd4726e1993d3  build/index.html
da84acde4134f21a898294256a0eff5e  build/index.html.gz
$ curl --head https://andrew.kvalhe.im/ | egrep '(md5|ETag|Cache):'       
x-amz-meta-content-md5: fe7000c853a03cfb971cd4726e1993d3
ETag: "da84acde4134f21a898294256a0eff5e"
X-Cache: Hit from cloudfront
manuelmeurer commented 10 years ago

I often build several times before deploying, so a simple build hook wouldn't be sufficient for my workflow.

Right, but then you don't use the after_build hook in your middleman-cloudfront config, correct? That would be the only way to get the info from Middleman directly after building about which files have been updated etc.

Adding invalidation support to Middleman::S3Sync has been on my mind, but there's also enough information in S3+CloudFront's HEAD responses for this extension to work independently

I don't know if I understand you correctly, but surely you don't want to curl all pages from your website to determine if they need to be invalidated? There might be thousands...

AndrewKvalheim commented 10 years ago

…you don't use the after_build hook in your middleman-cloudfront config, correct?

Correct. I don't couple deployment with build (to allow testing of the build), and I don't maintain state in the build directory (since it's not synchronized across development machines).

…you don't want to curl all pages from your website to determine if they need to be invalidated?

Yes, I'd rather use the information I already have from the S3 sync extension's GET Bucket request(s). Making a request to CloudFront per resource might be your only option if you wanted to meet the constraints that:

Not to suggest that this extension should meet all of those—you could decide to keep production state in the build directory (or a dedicated file under version control?), to retrieve information from the S3 API, to communicate with an S3 sync extension, to not optimize for invalidations, or that this is all a moot point since you don't want to presume an S3 origin in the first place.

manuelmeurer commented 10 years ago

I don't really understand what you mean by "production state" and why you wouldn't want to store it on development machines. Also, I don't see the connection to the S3 sync extension, which doesn't have anything to do with Cloudfront AFAICS (don't get me wrong, it's a great extension, I use it myself in all of my Middleman projects). Anyways, I think we're venturing too far off the point here. @tmaier's request to only invalidate updated (and new) files when using the after_build hook is completely legit and would be trivial to implement if Middleman could pass the list of updated files to the extension. Let's see if anything comes from the post in the Middleman forum, and if not, I will take a stab at implementing the necessary functionality in Middleman myself. :smile_cat:

tmaier commented 10 years ago

Thank you @manuelmeurer for your response so far. :)

Is it really necessary to invalidate new files as well? Cloudfront should not have any data about it and simply request it from the original server as soon as a user requests it.

manuelmeurer commented 10 years ago

Ah yes, of course... hadn't thought of that. Only changed files it is then. :hamster:

tmaier commented 10 years ago

They're working on a Pull Request: https://github.com/middleman/middleman/pull/1319

manuelmeurer commented 10 years ago

Sweet! I'll track the progress and implement this when possible.

manuelmeurer commented 10 years ago

The change has been merged to master now, and I tried to get it running with middleman-cloudfront, but it turns out master is going to become v4 and there are a lot of other changes (i.e. the way to write extensions seems to change fundamentally in v4). Once v4 is out, I will continue working on this.

AndrewKvalheim commented 10 years ago

How were you planning to pass the list of updated files into middleman-cloudfront? I'll be happy to just drop something like this into config.rb, interface permitting:

after_s3_sync do |files_by_status|
  files = files_by_status[:updated]

  ::Middleman::Cli::CloudFront.new.invalidate(cloudfront_options, files)
end
manuelmeurer commented 10 years ago

@AndrewKvalheim Check out the PR referenced above. I hope the builder object that is passed to the after_build callback will include the updated files.

AndrewKvalheim commented 10 years ago

Right, and once you have the list of updated files from builder, do you have any preference for how to pass them into the Middleman::Cli::CloudFront instance—e.g. via another argument to invalidate (as above), via a new option in cloudfront_options, or something else?

As this is otherwise ready to go, I'm inclined to add the necessary interface myself but don't want to conflict with any plans you've already made.

manuelmeurer commented 10 years ago

Oh, nice! :smile_cat: I'd say we use a second parameter that contains the list of updated/synced files. If that parameter is present (defaults to []), the filter option is ignored (or should the updated files be filtered?) and those files are invalidated. Thoughts?