argoproj / applicationset

The ApplicationSet controller manages multiple Argo CD Applications as a single ApplicationSet unit, supporting deployments to large numbers of clusters, deployments of large monorepos, and enabling secure Application self-service.
https://argocd-applicationset.readthedocs.io/
Apache License 2.0
584 stars 278 forks source link

Github rate limit hit, even using `scmProvider.filters` #604

Closed dllegru closed 2 years ago

dllegru commented 2 years ago

The setup we want to accomplish using scmProvider for Github, is the following:

Our Github org has about 250 repositories. When using scmProvider.github with certain scmProvider.filters combinations to restrict the tracking to a single repository, argo is constantly sending Github api requests and we hit the 5k rate-limit in ~40minutes. That should not be the behaviour when just tracking a single repository.

I've seen that this was initially reported in the issue #464 and a fix done in PR #472. I think this fix is just partially working and some parameters used in combination are not working well.


We've performed different scmProvider scenarios for our use case and below are the outcomes:

Scenario 1: allBranches to false.

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: false
        filters:
          - repositoryMatch: platform-daniel-tests
            branchMatch: ^dev

Outcomes:

Comments: When deploying this configuration, seems an initial scan is done as it is using ~500 api calls to Github. After that initial scan, the api calls stop. Not sure why this is done as we're locking-in the repositoryMatch to just a single repository + allBranches is set also to false, imo we shouldn't have to use all those calls. No apps are generated by applicationSet.


Scenario 2 allBranches to true.

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: true
        filters:
          - repositoryMatch: platform-daniel-tests
            branchMatch: ^dev

Outcomes:

Comments: With this configuration, the api requests to github are done non-stop, we are constantly sending api requests until we hit the 5k limit in about ~40 mins. Two application resources get created and we have deployments done, from branches dev-test1 & dev-test2. This is the scenario we want, but we can't use it due hitting the rate-limit.


Scenario 3 allBranches to false, no branchMatch used

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: false
        filters:
          - repositoryMatch: platform-daniel-tests

Outcomes:

Comments: Only 10 api calls are done and stops there, some minor calls from time to time.
Only generates application from main branch which is the default for the repo platform-daniel-tests. The outcome is good, but not valid for our use case.


Scenario 4 allBranches true, no branchMatch used:

    - scmProvider:
        github:
          organization: my-org
          # If true, scan every branch of every repository. If false, scan only the default branch. Defaults to false.
          allBranches: true
        filters:
          - repositoryMatch: platform-daniel-tests

Outcomes:

Comments: Only 10 api calls are done and stops there, some minor calls from time to time. Generates applications from all branches [main, dev-test1, dev-test2] The outcome is good, but not valid for our use case.

dllegru commented 2 years ago

Closing for https://github.com/argoproj/argo-cd/issues/10788