Shopify / yjit-metrics

"Tasks for benchmarking, building and collecting stats for YJIT"
MIT License
14 stars 9 forks source link

Use a better AWS instance for continuous benchmarking #45

Closed noahgibbs closed 2 years ago

noahgibbs commented 3 years ago

For now I'm setting things up on a non-dedicated c5.2xlarge AWS instance. That's fine for basic usage, though it's not going to produce publishable benchmark results. It's prone to noisy neighbours, among other problems.

Getting a dedicated host (probably c5.metal) is around $30k/year on-demand price ($29,506.36/year in us-east-1.) That amount of money is worth optimising with a reserved instance, which means talking to Finance. Note to self: #google-cloud-finances in internal Slack is probably the place to start asking. (employee) will vouch to them that we can't get this instance type in Google Cloud.

In the mean time I can continue to use spendy dedicated instances on-demand for temporary jobs like VMIL results while keeping a much cheaper shared-tenancy c5.2xlarge running CI-type tasks 24/7.

noahgibbs commented 2 years ago

Okay. Time to fix this.

noahgibbs commented 2 years ago

We definitely need dedicated tenancy. For that you can't just switch it, you have to recreate from an image. Image: ami-01b7506ad3b93ffc2

noahgibbs commented 2 years ago

For now I'm not going to upgrade to a c5.metal. Keeping it with a 2xlarge will provide better continuity of results at a lower price, as long as we're not going over the 360-minute job limit. And for now we have some slack there, and we could make more if needed by dropping some of the MJIT re-testing.

noahgibbs commented 2 years ago

Ah, right. But if we want a dedicated host, we have to go up to c5.metal. We can get plain "dedicated" tenancy, which allows sharing the hardware with other Shopify instances, which doesn't necessarily guarantee the level of isolation we want. We had this same trouble with the VMIL benchmarking, and solved it the same way.

noahgibbs commented 2 years ago

Nope, some things have changed. Though part of the VMIL solution is needed - a c5.metal instance. Which is actually larger than is available on a c5 dedicated host. EC2 is weird. Nonetheless, using a c5.metal instance looks like it should keep anybody else from running on the same hardware.

noahgibbs commented 2 years ago

This is changed over now. It's all over but the debugging!