gojuno / engineering

Juno Engineering Blog
Apache License 2.0
33 stars 4 forks source link

Discussion & Comments: "CI Pipeline and Custom Tools of Android projects at Juno" #3

Open artem-zinnatullin opened 7 years ago

thevery commented 7 years ago

How many build agents do you have - 10 (according to teamcity pricing)? What is your average active LF?

artem-zinnatullin commented 7 years ago

We don't have that many machines, but we do run builds concurrently on some of them, in Jenkins it's called executors, each executor requires separate build agent license in TeamCity :(

What is LF?

thevery commented 7 years ago

LF=load factor

artem-zinnatullin commented 7 years ago

You mean Load Average of Linux machines or LF of the build queue on CI?

htop_aws

c4.4xlarge (I think), 3 parallel builds, LA usually between 6-10.

htop

Core i7 7700, 3 parallel builds with 4 emulators in each build, LA usually somewhere between 10-16.


As said in the article, I'd be happy to merge all our CI clusters into one cluster with autoscaling.

However scaling machines for UI tests will be problematic because we need both GPU and KVM access as well as fast CPU, SSD and 32+gb of RAM, so for now these are custom-built ones.

thevery commented 7 years ago

You mean Load Average of Linux machines or LF of the build queue on CI?

Latter - it is important for teamcity pricing. 10 agents with LF=50% can be probably replaced with just 5-6 without drastic build queue increase. So how many licenses do you need?

artem-zinnatullin commented 7 years ago

Crosslinking with #5: "Discussion & Comments: UI Testing with Espresso"

artem-zinnatullin commented 7 years ago

@thevery

Latter - it is important for teamcity pricing.

jenkins_load_stat

10 agents with LF=50% can be probably replaced with just 5-6 without drastic build queue increase.

Problem with our flow of rebuilding PRs on target branch change is that it creates spikes in CI load and those spikes must be resolved asap to not slow down development process, so we need to be able to process lots of parallel builds.

As you can see from the graph above current cluster is near to limit of its capacity and there are times when we have builds hanging in queue.

So decreasing number of executors will increase number of builds in queue and slow down development.

At the same time, simple increasing amount of executors and nodes will be inefficient use of resources because we clearly have time gaps when developers are not at work.

So how many licenses do you need?

Currently we have 7 executors in Jenkins cluster which is equivalent to 7 build agents in TeamCity which automatically places us into 10 build agent licenses.

As said above — decreasing this number will definitely slow down development.

Ideally we could have CI cluster with autoscaling (can be based on Kubernetes and AWS for instance).

artem-zinnatullin commented 7 years ago

Here is closer look to spikes I was talking about (happening right now):

spike
thevery commented 7 years ago

Almost empty buld queue - nice! Licenses: you can get exactly 5+2=7 licenses if you need. Autoscaling: out of the box on teamcity though I haven't tried it yet. Anyway you need extra licenses for this. The way to overcome expensive licenses is actually sharing them between teams - you don't usually​ need all the resources at the one moment.

artem-zinnatullin commented 7 years ago

Licenses: you can get exactly 5+2=7 licenses if you need.

Didn't know that!

The way to overcome expensive licenses is actually sharing them between teams - you don't usually​ need all the resources at the one moment.

Yup, mentioned this above and in the article, totally agree