cds-snc / platform-forms-client

NextJS application that serves the public-facing website for Forms
https://forms-staging.cdssandbox.xyz/
MIT License
31 stars 12 forks source link

Form response retrieval API file attachment download #934

Open patheard opened 1 year ago

patheard commented 1 year ago

Summary

Add a new Form response retrieval API endpoint to allow for the download of a form's file attachments.

Authorization will be performed by the temporary token passed as an authorization header to confirm that:

  1. The submission ID belongs to the given form ID; and
  2. The requesting user has permission to view the given form’s submissions.
// request
/api/id/$FORM_ID/$SUBMISSION_ID/files/download

// response
Zip file of the form submission's file attachments

Note

If any file attachments do not have a scan status of clean, the following steps will be taken:

Instead of directly including the file in the zip, it will first be added to a password protected zip which is then included. An INSTRUCTIONS.txt text file will be added to the parent zip that contains the file’s protected zip password of I_MIGHT_BE_A_VIRUS.

Example zip file structure

- file_attachment_one.png
- file_attachment_two.png

Example zip file structure with infected file

- file_attachment_one.png
- file_attachment_two.png
- infected_file_attachment_name.zip
- INSTRUCTIONS.txt

Related

patheard commented 1 year ago

A base working example of a Zipper lambda function has been created here: https://github.com/patheard/aws-lambda-zipper

The above function has also been incorporated in the forms-terraform repo in the feat/lambda-zipper branch, but only limited testing has been performed: https://github.com/cds-snc/forms-terraform/compare/feat/lambda-zipper

patheard commented 1 year ago

To test how NextJS memory use is impacted by streaming S3 object downloads, the following endpoint was created: https://gist.github.com/patheard/f75cdc16b0c45ce7c4863967b2fd7797

# Create large file
fallocate -l 250M some-big-file.img

# Upload to forms-terraform localstack Vault S3 bucket
aws s3api put-object \
  --bucket forms-local-vault-file-storage \
  --key some-big-file.img \
  --body some-big-file.img \
  --region ca-central-1 \
  --endpoint-url $LOCAL_AWS_ENDPOINT

# Download large file from S3 bucket
curl http://localhost:3000/api/test/download?key=some-big-file.img --output some-big-file.img

...lots of similar output
[1661456595783] INFO: some-big-file.img: streaming memory 166.696 MB (delta 0.003 MB)
[1661456595784] INFO: some-big-file.img: streaming memory 166.699 MB (delta 0.003 MB)
[1661456595784] INFO: some-big-file.img: streaming memory 166.702 MB (delta 0.003 MB)
[1661456595784] INFO: some-big-file.img: streaming memory 166.708 MB (delta 0.006 MB)
[1661456595962] INFO: some-big-file.img: streaming memory 167.395 MB (delta 0.687 MB)
[1661456595962] INFO: some-big-file.img: streaming memory 167.398 MB (delta 0.003 MB)
[1661456595963] INFO: some-big-file.img: streaming memory 167.401 MB (delta 0.003 MB)
[1661456595963] INFO: some-big-file.img: streaming memory 167.404 MB (delta 0.003 MB)
[1661456595963] INFO: some-big-file.img: streaming memory 167.407 MB (delta 0.003 MB)
[1661456595964] INFO: some-big-file.img: max memory 173.514 MB

The test shows that there is only a small increase in memory used by the Node process as the S3 object is streamed back to the NextJS response. As a result, using the ECS task to stream file downloads would be a viable option.