Image pipeline - Githubissues

thisisdano commented 7 years ago

The jamstack version of digital.gov is a site built in Hugo and hosted on Federalist. Digital.gov currently has over 1500 posts, about 170 other docs pages, plus events. Many of these pages have unique images, and the site is only going to get bigger. We currently have over 1GB worth of image assets hosted through AWS/Sites, including all the image files and their associated responsive variants. We will probably continue to have additional responsive variants in the future: more sizes, file formats, colorspaces, and blur levels.

This is a considerable amount of overhead — to process and to host — and we tend to think it doesn't make sense to host all this imagery in Github. Not only does it take a up lot of space, but transferring all this data over to Federalist with each build would be slow and excessive.

To create the proper responsive picture or srcset element in the html requires knowing a little bit about the original file — it's original size is the most important thing to know, but whether the file has a grayscale or blurred variant would also be useful. If the original files are only available to Hugo via an AWS API or some kind of ImageMagick ping to a url, it could lead to long, painful build times. I hypothesize that creating a YAML manifest file for each processed image could be a good way to capture this kind of image data and make it available to Hugo templates without ImageMagick, APIs, JavaScript, or some kind of server-side solution. (Any of these solutions could be acceptable however — I'm willing to explore them or others.)

So, we need a way to take new (or existing) image assets, test for uniqueness, create responsive variants based on the image size/filetype (with reasonable compression and possible colorspace and blur variants), create a record of each image containing some basic information about the file, upload the image and its variants to an AWS location, and be resilient to failure up and down the chain.

Take a look at this first: https://www.npmjs.com/package/hugulp

For file [imgBase].[imgExt]

[ ] Pull new asset from an inbox
[ ] Check /data/img for asset uniqueness
[ ] Optimize the image
[ ] Create optimized responsive versions
- [ ] Breakpoint sizes TBD at sizes below that or original image
[ ] Create blur-up version
[ ] Create grayscale variants?
[ ] Create webp variants (or other filetypes)?
[ ] Write file info/manifest to a unique data file in /data/img
[ ] Upload files to a unique folder in S3 named for [imgBase]
[ ] Check for success
[ ] Delete asset from inbox

Current asset data.img.yml. This is subject to change, but needs to include width and height and could be nice to have the others:

date    : 2010-10-22 11:33:15 -0400
width   : 800
height  : 600
format  : jpg

thisisdano commented 7 years ago

Something like this for writing the manifest? https://www.npmjs.com/package/write-data

Now, how to pass if the proper data...

jeremyzilar commented 7 years ago

date    : 2010-10-22 11:33:15 -0400
uid    : my-cat
width   : 1200
height  : 800
format  : jpg

thisisdano commented 7 years ago

maybe uid instead of slug for consistency with our other data

jeremyzilar commented 7 years ago

Checking to see why sometimes an image is firing and uploading just fine, when other times it stalls without warning.

I think it is might have to do something with the fact that Gulp runs tasks in parallel (at the same time) when we want certain actions to be run as a series. Reading this: https://fettblog.eu/gulp-4-parallel-and-series/

jeremyzilar commented 7 years ago

If I am not mistaken, this is how things are currently firing:

Start watch task to see if there are imgs in __inbox If there are imgs in __inbox

run gulp.task("img-variants", function...
- Cleans up filenames/paths
- moves file(s) to be processed in _tmp
- makes img variants
- moves processed file(s) to _processed
run gulp.task("upload", function...
- uploads processed file(s) to S3
- moves processed file(s) to _uploaded
run gulp.task("upload-cleanup", function...
- deletes the files in _uploaded
run gulp.task("proxy", function... (not sure if I am reading this one right...)
- creates the lo-res img for Hugo to parse

jeremyzilar commented 7 years ago

For the filenames, can I suggest that we:

remove the double underscores?
add two-letter modifiers to the ends of the size that indicate color variants _w1200bw. This way, I know that I can always get my image at my-cat_w1200.jpg but if I would like the black and white version, I can add bw to my-cat_w1200bw.jpg

thisisdano commented 7 years ago

If there's no size indicator (i.e. the original–size image), should we use _bw as the suffix?

aalikaram commented 2 years ago

🛑DEPRECATED🛑 - Repo no longer being maintained #79

GSA / digital.gov

Image pipeline #79