MSD-LIVE / issues

0 stars 0 forks source link

add cloud cost saving limits/crons/checks for non-prod deployments #215

Open zguillen opened 3 months ago

zguillen commented 3 months ago

We should add some checks/automations in places to help with costs so that our dev/stage cloud instances don't eat up too much of our cloud budget. Ideas:

  1. Look into s3 quotas that can be added to all dev/stage buckets?
  2. put dev/stage on a weekly complete wipe/recreate schedule?
  3. shut down all non-prod ec2 resources nightly or for weekends?

We also have quotas for aws resources, especially datasync related ones like, that when shared accross dev/stage/prod we will certainly reach if we don't do something. Resource that will continue to grow that I can think of (multiply what we use by 3 [per deployment]):

  1. locations (200, aws tech guy said we can request increase but its nowhere listed in quota UI so might have been BS): using 1 per project, 1 per efs drive. (?? PLUS 1 per draft dataset with files in MSD-LIVE) PLUS 2 per edu notebook [currently at 78]
  2. tasks (100, can request increase): 1 per draft with files in MSD-LIVE [currently at 54]
  3. s3 access points (1000, can request increase): 1 per dataset [currently at 449 (dev/stage automated tests create datasets each time they run now so this will grow faster than before)]
ghost commented 2 months ago

Adding to this ticket:

We need to setup limits or alarms on the EC2 instances we use for notebooks. Specifically, when developing locally, developers need to manually shut down the instances. It's easy to forget to do, and the costs can add up quickly.

zguillen commented 2 months ago

Low hanging fruit: automate shutting down all ec2 instances every night (esp. notebooks)

can we set env for dev and stage service stack to have worker and web counts set to 0

zguillen commented 2 months ago

Another item to address if we want to save costs is to add a lifecycle rule to our project buckets to permanently delete files with delete markers eventually. Because we have versioning turned on we can recover deleted/edited files when necessary (this has never happened though) but it means that we're paying for files that get deleted indefinitely.

We could def add lifecycle rules to non-prod buckets to delete files permanently (but what about other items associated with drafts/records that will continue to grow [datasync tasks, s3 access points, dataset files copied to efs]