Expand GPU instances or pick new ones for GPU CI

choderalab / perses

Experiments with expanded ensembles to explore chemical space

http://perses.readthedocs.io

MIT License

178 stars 50 forks source link

Expand GPU instances or pick new ones for GPU CI #1152

Closed mikemhenry closed 1 year ago

mikemhenry commented 1 year ago

Right now we are getting a lot of these errors: InsufficientInstanceCapacity: We currently do not have sufficient p2.xlarge capacity in the Availability Zone you requested (***d). Our system will be working on provisioning additional capacity. You can currently get p2.xlarge capacity by not specifying an Availability Zone in your request or choosing ***a, ***b, ***c, ***e.

I will first see if I can expand our availability zones, and if that fails, use a slightly more expensive GPU instance.

jchodera commented 1 year ago

g4dn.xlarge instances are about half the price of p2.xlarge instances and feature an NVIDIA T4, which is more modern than (but relatively comparable to) the K80 from the p2.xlarge instances. Either probably works well for us!

ijpulidos commented 1 year ago

I tried changing the instance type to use g4dn.xlarge and now we don't get the capacity error but the registering error that we have discussed before as in https://github.com/choderalab/perses/actions/runs/4168318343/jobs/7214977022#step:3:56

I tried changing some things in how we are using the runner, based on what's discussed in https://github.com/machulav/ec2-github-runner/pull/127 without success.

ijpulidos commented 1 year ago

For the consistency tests in #1065 we are already using g4dn.xlarge. @mikemhenry can you confirm this is no longer an issue and that we can close it? Thanks!

mikemhenry commented 1 year ago

Yes, once we merge in #1065 this issue will be resolved