GSA / digitalgov.gov

Digital.gov: Better websites. Better government.
https://digital.gov
Other
218 stars 299 forks source link

Rs hotfix/fix corrupt image uploads #7804

Closed RileySeaburg closed 1 month ago

RileySeaburg commented 1 month ago

Summary

This PR refactors the file upload process to address issues with image uploads and file processing. It replaces the gulp-awspublish plugin with a custom TypeScript-based upload module using AWS SDK v3, while maintaining Gulp task integration for workflow consistency.

Solution

This PR implements the following changes:

  1. Removes the gulp-awspublish plugin and replaces it with a custom TypeScript module (config/typescript/upload.ts) using AWS SDK v3 for file uploads.
  2. Maintains separate functions for uploading images and files, preserving the existing structure.
  3. Implements a cleanup function to remove temporary working directories after successful uploads.
  4. Updates the Gulpfile to integrate the new TypeScript-based upload module within the existing Gulp task structure.

This approach was chosen to:

  1. Improve type safety and catch potential errors at compile-time by using TypeScript.
  2. Modernize the upload process by utilizing the latest AWS SDK v3, removing the dependency on the outdated gulp-awspublish plugin.
  3. Maintain the existing Gulp-based workflow for consistency and ease of integration.
  4. Ensure proper cleanup after the upload process.

The change was implemented by creating a new TypeScript file for uploads, compiling it to JavaScript, and integrating it into the existing Gulp tasks. This allows us to have more control over the upload process while keeping the familiar Gulp task structure.

How To Test

  1. Install dependencies: npm install
  2. Place test images in content/uploads/_inbox/__add image or static files to this folder__
  3. Run the Gulp upload task: npx gulp upload
  4. Verify that files are uploaded to the S3 bucket and local directories are cleaned up

Dependency updates

Dependency

Dev Dependency name Previous version New version
@aws-sdk/client-s3 -- ^3.614.0
dotenv -- ^16.3.1
mime-types -- ^2.1.35

Dev Dependency

Dev Dependency name Previous version New version
@typescript-eslint/eslint-plugin -- ^5.x.x
@typescript-eslint/parser -- ^5.x.x
@types/mime-types -- ^2.1.4
typescript -- ^4.x.x
gulp-awspublish ^8.0.0 Removed

Script updates

Notes

github-actions[bot] commented 1 month ago

:mag: Preview in Federalist

RileySeaburg commented 1 month ago

What I Tested

Tested all file types and all steps to get a successful upload.

| name | type | upload | cleanup | valid public file |

| -------------------------------------- | ---- | ------ | ------- | ----------------- |

| test-j-image-upload-7-25-2024.jpg | jpg | pass | pass | pass |

| test-p-image-upload-7-25-2024.png | png | pass | pass | pass |

| test-document-upload-7-25-2024.pdf | pdf | pass | pass | pass |

| test-powerpoint-upload-7-25-2024.pptx | pptx | pass | pass | pass |

| test-spreadsheet-upload-7-25-2024.xlsx | xlsx | pass | pass | pass |

[!NOTE]

I did notice in my test file names that when I used test for the pdf it would rename from test-document-upload-7-25-2024.pdf to test.document-upload-7-25-2024.pdf. Just something to keep note of.

Suggestions

I've added some jsdocs for the functions in the upload.ts (f4e50c97ad757293045e104fef157ab346608528)

Questions

Can we add a linting and formatting step to the package.json for the typescript? Or is that handled when running tsc?

By adding another dependency with typescript, what are the benefits and some of the risks with having to support it for the uploading process over gulp?

If an image error occurs in the future, what would we need to know about the typescript code to address?

@nick-mon1

This is not robust code to handle every edge case, merely a search for a peer review to ensure there are no obvious errors.

The key thing to remember about TypeScript is, JavaScript is valid TypeScript.

JsDocs and ESLint can assist in ensuring that formatting and documentation are handled.

I can add a commit that further integrates typescript into the tooling tomorrow.

Overall this does not complicate the gulp upload process.

They only thing it does is solve the data race issue by using Node.js to upload the files and images.

This could just as easily be written with vanilla.js.

RileySeaburg commented 1 month ago

@nick-mon1 Idk why it said I dismissed you request.

I addressed them all here. JSDoc and ESLint support enabled.

I added a settings.json for workspace ESLint Highlighting as well.

RileySeaburg commented 1 month ago

I simplified the testing instructions since compiling typescript is not necessary.

RileySeaburg commented 1 month ago

Thanks for your hard work on this. I was able to upload files successfully without any corruption. I like the improvements to linting (JSDoc) and documentation too.

Added minor comments and questions. The biggest one being the addition of typescript.

Tested

  • [x] Uploaded powerpoint without corruption
  • [x] Uploaded PNG successfully 1
  • [x] Uploaded JPEG successfully 2

Build log

[14:24:06] Using gulpfile ~/web/digitalgov.gov/gulpfile.js
[14:24:06] Starting 'upload'...
[14:24:06] Starting 'fileTidy'...
Moving file from content/uploads/_inbox/TEST-USWDS Monthly Call June 2024.pptx to content/uploads/_working-files/to-process/test-uswds-monthly-call-june-2024.pptx
Moving file from content/uploads/_inbox/TEST-dg-7804.png to content/uploads/_working-images/to-process/test-dg-7804.png
Moving file from content/uploads/_inbox/TEST-ooh.png to content/uploads/_working-images/to-process/test-ooh.png
[14:24:06] Finished 'fileTidy' after 2.24 ms
[14:24:06] Starting 'cleanInbox'...
[14:24:06] Finished 'cleanInbox' after 4.16 ms
[14:24:06] Starting 'writeDataFile'...
file is written
[14:24:06] Finished 'writeDataFile' after 64 ms
[14:24:06] Starting 'processImages'...
[14:24:06] Finished 'processImages' after 2.77 ms
[14:24:06] Starting 'removeProcessedImage'...
file is written
file is written
[14:24:06] Finished 'removeProcessedImage' after 37 ms
[14:24:06] Starting 'uploadTask'...
Error processing variant /Users/jmejia-a/web/digitalgov.gov/content/uploads/_working-images/to-process/test-ooh.png: Error: /Users/jmejia-a/web/digitalgov.gov/content/uploads/_working-images/to-process/test-ooh.png: unable to open for read
unix error: No such file or directory
/Users/jmejia-a/web/digitalgov.gov/content/uploads/_working-images/to-process/test-ooh.png: unable to open for read
unix error: No such file or directory
pngload: stream error
    at Sharp.toFile (/Users/jmejia-a/web/digitalgov.gov/node_modules/sharp/lib/output.js:89:19)
    at createResponsiveVariant (/Users/jmejia-a/web/digitalgov.gov/config/gulp/file-process.js:90:6)
    at /Users/jmejia-a/web/digitalgov.gov/config/gulp/file-process.js:110:5
    at Array.forEach (<anonymous>)
    at processImageVariants (/Users/jmejia-a/web/digitalgov.gov/config/gulp/file-process.js:109:34)
    at /Users/jmejia-a/web/digitalgov.gov/config/gulp/file-process.js:139:11
    at Array.map (<anonymous>)
    at /Users/jmejia-a/web/digitalgov.gov/config/gulp/file-process.js:135:41
    at FSReqCallback.oncomplete (node:fs:191:23)
    at FSReqCallback.callbackTrampoline (node:internal/async_hooks:130:17)
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-dg-7804.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-dg-7804_w1200.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-dg-7804_w200.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-dg-7804_w400.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-dg-7804_w800.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-ooh.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-ooh_w1200.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-ooh_w200.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-ooh_w400.png
Image uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/test-ooh_w800.png
File uploaded successfully. Public URL: https://digitalgov.s3.amazonaws.com/static/test-uswds-monthly-call-june-2024.pptx
All uploads completed successfully.
Cleanup completed
[14:24:10] Finished 'uploadTask' after 4.41 s
[14:24:10] Finished 'upload' after 4.52 s

Files tested

Additional chores

We'll need to delete all test files on S3 bucket.

Footnotes

  1. Uploaded a small PNG that triggered a build error, but uploaded successfully.
  2. Image was converted to PNG. Is this intentional? PNG files are larger and can be worse quality than JPEG.

Addressing your footnote, this is probably an artifact from the jpg -> png. issue #7686. I will address this as well.

@mejiaj

RileySeaburg commented 1 month ago

I just found out Gulp has typescript support.

Let's merge this for now since it's confirmed working - To fix production upload and will reconfigure in another PR and trello card to clean up the workflow.