CDLUC3 / dmphub

Simple metadata repository for networked DMPs
MIT License
3 stars 1 forks source link

Lambda - PDF downloader #84

Closed briri closed 1 year ago

briri commented 2 years ago

Build a Lambda function that downloads the PDF and stores it in S3

NOTE that this should be invoked after persisting a new DMP and it should have some sort of validation/security check to ensure we're only storing legitimate PDFs

briri commented 1 year ago

Downloads are triggered via SNS invocation of the new PdfDownloader lambda function when a DMP is created or updated and the call contains a dmproadmap_related_identifier that has: "descriptor": "is_metadata_for", "work_type": "output_management_plan"

PDF is downloaded and stored in the S3 bucket with a key prefix of dmps/ and a random name. For example.

The original DMP record is then updated in the database so that the S3 location becomes the related identifier and the original location is stored for reference.

"dmphub_provenance_download_url": "https://dmptool.org/plans/12345/export.pdf",
{
  "dmproadmap_related_identifiers": {
    "descriptor": "is_metadata_for",
    "identifier": "http://uc3-dmp-hub-dev-data-s3bucket-1maeiw6w3tts0.s3-website-us-west-2.amazonaws.com/dmps/1c51f70b8e759aed.pdf",
    "type": "url",
    "work_type": "output_management_plan"
  }
}