gitlabform / gitlabform

🏗 Specialized configuration as a code tool for GitLab
https://gitlabform.github.io/gitlabform/
MIT License
411 stars 96 forks source link

Performance considerations #486

Open rossbeehler opened 1 year ago

rossbeehler commented 1 year ago

We're on GitLab SaaS with over 3000 repos, and climbing, and our GitLabForm nightly process takes many, many hours to run. Are there any tips, tricks, etc. to make this faster. I know I could use the group structure and just run concurrent CI/CD jobs per 2nd-level group, but wondered if there was any other ideas, configurations, etc. that I might be missing. For example, it would be nice if there was a concurrency setting, and all groups in the hierarchy are processed separately/concurrently based on that setting.

gdubicki commented 1 year ago

Hey @rossbeehler!

I was thinking about adding concurrency a few times in the past but: a) I always ended up thinking that there is no need, b) it's not trivial.

So I felt that it's not needed because for Egnyte where we use self-hosted GitLab instance, applying the config for a bit over 1000 projects and over 30 groups takes about 16 minutes now. I assumed that this is not much for a pretty large scale.

And it's not trivial because of the output - you'll have to implement some buffering solution to prevent having output for all groups and projects mixed up.

Anyway, because I recently don't have too much time for the project (see #343), I am open to PRs adding it.

As for the other things that you might do:

Perhaps to be continued...

Let me know what you think!

rossbeehler commented 1 year ago

Thanks for the detailed response, @gdubicki. May take me a while, but I'll see if I can test in GCP us-east1 and report back on how much it improved performance. We are in Azure East US 2 at the moment, so only a couple states away, but I'm sure co-locating on the same cloud/region would make a significant difference.

I will say we see a rather regular consistent amount of time processing each project, but we'll also turn on verbose at some point to see if anything stands out.

nejch commented 1 year ago

Keep in mind with a lot of API calls you probably start to hit GitLab's rate limiting rather than just network performance issues:

https://docs.gitlab.com/ee/user/gitlab_com/index.html#gitlabcom-specific-rate-limits

At least that would be my assumption as I also work with a large self-hosted instance. So even with concurrency, since urllib's Retry respects retry-headers headers by default, I think it would slow down after 429 responses. I may be wrong though, would have to benchmark that.

So one aspect of optimization would be for gitlabform to make as few requests as possible (e.g. ensure the max 100 per page for pagination, avoid making the same get calls if the data is already fetched etc. Just an idea though!

adam-moss commented 1 year ago

We have this challenge with 10k+ repos, with this, danger, triage-bot, and renovate. Basically anything we want to run across the estate. What we found was running any of them as one continuous run took hours.

What we did was take the approach the tool is not the issue, at a repo level it is fast enough. Ergo our execution approach was suboptimal at scale. So what we do now is:

1) hit the gitlab api for a list of all groups & projects 2) use that to generate child pipelines, running 1 instance of the job for each repo 3) batch them into blocks of n size, whatever you're comfortable with within the rate limits 4) use resource_group in the gitlab-ci.yml to ensure only X child pipelines run in parallel.

Doing this took our renovate run-time from > 24hrs to < 3hrs, which was certainly a win for us.

We also use the Audit Stream to run on triggers.

This isn't the code we use now, but you can see an earlier iteration I shared over at the reno repo https://github.com/renovatebot/renovate/discussions/13172#discussioncomment-1863108

nejch commented 1 year ago

I think that makes sense, if I remember correctly we had a similar approach but just with parallel: and use of CI_NODE_* variables. Btw might want to be careful with renovating 10k repos from a single renovate instance :P https://docs.renovatebot.com/gitlab-bot-security/ should get better after 15.9 with the job token scope.

adam-moss commented 1 year ago

Yeah, its a risk but the exploit opportunity is minimised as much as is currently possible. And you need an active SSO session on our IdP. Token scopes will definitely be better but it is only once the support for adding the email address, signing key, and regenerating rather than recreating PRaTs lands that it can be truly secured.

gdubicki commented 1 year ago

What target are you using to run the app, @rossbeehler? ALL_DEFINED? ALL? Something else?

rossbeehler commented 1 year ago

We're running it against our top-level group for our organization.