GSA / digital.gov

DEPRECATED 🛑-The future site of Digital.gov
https://www.digital.gov
4 stars 4 forks source link

Image pipeline #79

Closed thisisdano closed 2 years ago

thisisdano commented 7 years ago

The jamstack version of digital.gov is a site built in Hugo and hosted on Federalist. Digital.gov currently has over 1500 posts, about 170 other docs pages, plus events. Many of these pages have unique images, and the site is only going to get bigger. We currently have over 1GB worth of image assets hosted through AWS/Sites, including all the image files and their associated responsive variants. We will probably continue to have additional responsive variants in the future: more sizes, file formats, colorspaces, and blur levels.

This is a considerable amount of overhead — to process and to host — and we tend to think it doesn't make sense to host all this imagery in Github. Not only does it take a up lot of space, but transferring all this data over to Federalist with each build would be slow and excessive.

To create the proper responsive picture or srcset element in the html requires knowing a little bit about the original file — it's original size is the most important thing to know, but whether the file has a grayscale or blurred variant would also be useful. If the original files are only available to Hugo via an AWS API or some kind of ImageMagick ping to a url, it could lead to long, painful build times. I hypothesize that creating a YAML manifest file for each processed image could be a good way to capture this kind of image data and make it available to Hugo templates without ImageMagick, APIs, JavaScript, or some kind of server-side solution. (Any of these solutions could be acceptable however — I'm willing to explore them or others.)

So, we need a way to take new (or existing) image assets, test for uniqueness, create responsive variants based on the image size/filetype (with reasonable compression and possible colorspace and blur variants), create a record of each image containing some basic information about the file, upload the image and its variants to an AWS location, and be resilient to failure up and down the chain.

Take a look at this first: https://www.npmjs.com/package/hugulp

For file [imgBase].[imgExt]

Current asset data.img.yml. This is subject to change, but needs to include width and height and could be nice to have the others:

date    : 2010-10-22 11:33:15 -0400
width   : 800
height  : 600
format  : jpg
thisisdano commented 7 years ago

Something like this for writing the manifest? https://www.npmjs.com/package/write-data

Now, how to pass if the proper data...

jeremyzilar commented 7 years ago
date    : 2010-10-22 11:33:15 -0400
uid    : my-cat
width   : 1200
height  : 800
format  : jpg
thisisdano commented 7 years ago

maybe uid instead of slug for consistency with our other data

jeremyzilar commented 7 years ago

Checking to see why sometimes an image is firing and uploading just fine, when other times it stalls without warning.

I think it is might have to do something with the fact that Gulp runs tasks in parallel (at the same time) when we want certain actions to be run as a series. Reading this: https://fettblog.eu/gulp-4-parallel-and-series/

jeremyzilar commented 7 years ago

If I am not mistaken, this is how things are currently firing:

Start watch task to see if there are imgs in __inbox If there are imgs in __inbox

  1. run gulp.task("img-variants", function...
    • Cleans up filenames/paths
    • moves file(s) to be processed in _tmp
    • makes img variants
    • moves processed file(s) to _processed
  2. run gulp.task("upload", function...
    • uploads processed file(s) to S3
    • moves processed file(s) to _uploaded
  3. run gulp.task("upload-cleanup", function...
    • deletes the files in _uploaded
  4. run gulp.task("proxy", function... (not sure if I am reading this one right...)
    • creates the lo-res img for Hugo to parse
jeremyzilar commented 7 years ago

For the filenames, can I suggest that we:

thisisdano commented 7 years ago

If there's no size indicator (i.e. the original–size image), should we use _bw as the suffix?

aalikaram commented 2 years ago

🛑DEPRECATED🛑 - Repo no longer being maintained #79