Galooshi / happo

Visual diffing in CI for user interfaces
504 stars 16 forks source link

Upload images to global location, with filenames that are hashes of file contents #178

Open lencioni opened 7 years ago

lencioni commented 7 years ago

I was chatting with @lelandrichardson about some of our plans for supporting multiple rendering targets (#73) and generating a review HTML page (#176), and we thought it might not be a bad idea to change the way we upload images.

Currently, we upload an HTML file and a set of images as a sort of bundle to a single "directory". Over many runs, this is likely to produce duplicate images. What if, instead, we put the images in a more general location where the filenames are hashes of the file contents. That way, if we have multiple images that are the same, we don't upload them again.

This would have the advantage of allowing us to make all examples browsable on the review page with minimal overhead.

If we do this, the files should be stored in sub-directories so that the directory doesn't end up growing to contain millions of files over time.

We could probably also apply the same technique to the JavaScript files that we need for the review page, to cut down on duplication, but that seems less valuable than doing it for the images.

trotzig commented 7 years ago

This is a good idea. Do you have an idea for how it could be implemented as well? It could be as simple as using the uploader configuration option that I'm adding/thinking of adding as part of supporting #176.

// .happo.js
const S3Uploader = require('happo/server/S3Uploader');

module.exports  = {
  uploader: new S3Uploader({ 
    imageDirectory: 'some-global-dir',
  }),
}
lencioni commented 7 years ago

It would be nice to avoid re-uploading assets that already exist, so we might want to build in that check to the upload process. imageDirectory option sounds okay to me, but I think we could probably just set it to something and not make it configurable, right?

trotzig commented 7 years ago

Ah, that might be right. Since people mostly have to specify an s3 bucket name. We can choose to make a special folder for images, and check before upload.

lencioni commented 7 years ago

We might also try a potential optimization where we upload a JSON file for every sha that maps examples to their images. ~Then we can potentially download all of the images for the base commit if they already exist. Not sure if that will be faster but it might depending on how long the build takes.~ Then we can compare the hash of the current snapshot to the hash in the JSON file, which should be a lot faster.