envoyproxy / nighthawk

L7 (HTTP/HTTPS/HTTP2/HTTP3) performance characterization tool
Apache License 2.0
360 stars 81 forks source link

AWS/azp VM pools are going to be removed imminently #1227

Open phlax opened 1 month ago

phlax commented 1 month ago

Envoy has almost completely stopped using these pools - which are very expensive to run

the nighthawk repo (and any other related repos using these pools) need to migrate their CI to github asap

phlax commented 1 month ago

cc @mum4k

alyssawilk commented 1 month ago

cc @krajshiva @jmarantz

mum4k commented 1 month ago

Thank you for letting us know, I have notified our team and we will schedule this work at the first opportunity.

What is a realistic time frame in which we need to have this done?

phlax commented 1 month ago

pretty much asap - we have just about got the release branches migrated to github, so im ready to start removing all the old architecture

i had a quick look at the CI in this repo and it looks pretty trivial to migrate - ping me offline if you want to discuss what vms are available in github

mum4k commented 1 month ago

Thank you for offering your advice as always @phlax.

I agree that this shouldn't be complicated, our challenge will be that all our team's resources are currently tied in committed projects and maybe due to my lack of following on other (?) channels, this comes as unexpected. What is the longest you can reasonably wait, so we can do this without firefighting? Alternatively, do you have any cycles on your end to assist us if this needs to be done in O(days)?

phlax commented 1 month ago

well, kinda that, if it doesnt happen soon then ill need to address myself - really trying to avoid that - i have pretty heavy contention on my cycles atm as it is

mum4k commented 1 month ago

Understabdable, seems like we are on the same page. Let me scope this with the team, so we know how expensive this is. If it turns out to be sizable, I'll loop back here to discuss options. Otherwise I'll let you know when we can have this done.

mum4k commented 1 month ago

Initial estimate indicates this needs O(days) of work. The team doesn't have any spare capacity right now, however I will try to slot this into my next OnCall week (end of October), hoping for it to be quiter. If that doesn't work out, we can circle back and come up with a plan B.

mum4k commented 4 days ago

Some progress was made, https://github.com/envoyproxy/nighthawk/pull/1250 confirms that the actual migration to Github workflows is trivial.

Blocked on needing some form of caching, ideally RBE build or at least RBE cache. Attempts to run with RBE fail with the bazel server dying which needs some investigation.

I will have to task switch sway for this for a few weeks, but can get back here during my next oncall shift at the beginning of December. @phlax please update here if you do end up finding some time to figure out the bazel shutdowns. If we sort out the caching and all that remains will be the workflow migration, I might be able to get to this earlier given how trivial the actual migration is.