coforma / swift-tech-challenge

MIT License
0 stars 0 forks source link

[Draft] Data Ingestion Infra #61

Closed TheDanMiller closed 7 months ago

TheDanMiller commented 8 months ago

Description

We need to terraform the following pieces of infrastructure to support the data ingestion flow, ensuring that the app is maintainable in the long term and can be fed updated data as needed.

Acceptance Criteria

  1. An s3 bucket is provisioned with two directories application_questions and institution_data
  2. A lambda function that is created based on the ingest-instutitons.py file in the utilities folder of the code base
  3. The lambda function should have a trigger that happens when the there is a new filed added to institution_data/ that have the suffix .csv
  4. A SQS message queue that the above lambda function can write messages to, called process-institutions
  5. A lambda function that is created based on the generate-descriptions-and-images.py file
  6. The lambda should trigger by new messages in the queue
  7. A DynamoDB named institutions
  8. Amazon Bedrock with models Titan Image Generator G1, Titan Text G1 - Lite, and Jurassic-2 Mid enabled
  9. A lambda function that is created based on the ingest-applications.py file in the utilities folder of the codebase
  10. A trigger on the lambda function that hits every time a files is added to application_questions/ that have the suffix .csv in the s3 bucket from step 1
  11. A s3 bucket called swift-institution-images that the lambda from point 5 can write to
  12. An IAM role to do all the things

Additional Notes

  1. All Lambdas use json logging

Definition of Done

Definition of Done