ITISFoundation / osparc-simcore

🐼 osparc-simcore simulation framework
https://osparc.io
MIT License
43 stars 27 forks source link

Use initialized EBS storage instead of buffer machines #5864

Closed sanderegg closed 3 weeks ago

sanderegg commented 4 months ago

Concept

Instead of keeping running EC2 instances as buffer machine, we would only keep their respective EBS volumes up.

Needed changes

AMI:

### Eisbock
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6227
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6230
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6238
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6242
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6250
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6251
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6252
- [ ] https://github.com/ITISFoundation/osparc-simcore/pull/6299
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6254
- [ ] https://github.com/ITISFoundation/osparc-simcore/pull/6314
### Tasks
- [x] Modify AMI boot script to optionally skip instance storages
- [x] Have only 1 AMI with 500GB additional EBS disk?
- [ ] https://github.com/ITISFoundation/osparc-simcore/pull/6032
- [ ] https://github.com/ITISFoundation/osparc-simcore/pull/5923
- [ ] https://github.com/ITISFoundation/osparc-simcore/pull/6097
- [ ] https://github.com/ITISFoundation/osparc-simcore/issues/6045
- [x] Tune up disks (throughput + IOPS) on root drive and docker drive --> using maxed out GP3s
matusdrobuliak66 commented 4 months ago

https://depot.dev/blog/faster-ec2-boot-time

sanderegg commented 4 months ago

regarding autoscaling, I see currently 2 option:

pre-create complete EC2s:

  1. start buffer machine of cheap type (such as t3.medium or so)
  2. ensure startup is complete (such as pre-pulling) - using SSH, AWS SSM or hard-coded time
  3. stop machine (only the EBS disks are left to pay (8GB root + 500GB docker))
  4. when there is need for a new machine, first check if any stopped buffer machine is available, if yes set the correct type and start it
  5. when the machine is not needed anymore, instead of shutting it down, it can be passed to the buffer handler
  6. we might need to ensure the disk is cleaned between runs
  7. we need to monitor the EBS volumes/stopped machine and possibly remove them

only keep initialized EBS volumes

  1. start buffer machine of cheap type
  2. ensure startup is complete
  3. shutdown machine but keep EBS volume
  4. when there is need for a new machine, first check if we have free EBS volumes around, if yes use them
  5. monitor EBs volumes
  6. handle cleanup of volumes
sanderegg commented 1 month ago

User story

sanderegg commented 1 month ago

Create a graph of responsiveness vs costs for: