argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.9k stars 5.46k forks source link

Improve support for monorepo by supporting git options such as depth, and sparse checkout #11198

Open todaywasawesome opened 2 years ago

todaywasawesome commented 2 years ago

Summary

There are many situations where having more control over how git operates is very valuable.

Set checkout directory

This would checkout only the subfolder specified.

apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
  syncPolicy:
    syncOptions:
    - git_checkoutdir=yamls/

This could be the equivalent of git checkout <remote>/<branch> -- relative/path/to/file/or/dir or a true sparse checkout like git config core.sparseCheckout true if doing the latter it would need to be a setting on the repository.

Set checkout depth

Rather than pulling the entire history (default) this would allow specifying the depth to collect.

apiVersion: argoproj.io/v1alpha1
kind: Application
spec:
  syncPolicy:
    syncOptions:
    - git_depth=1

Setting git options on repos

For something like depth, a maxdepth would make more sense to restrict applications using that repository.

apiVersion: v1
kind: Secret
metadata:
  name: private-repo
  namespace: argocd
  labels:
    argocd.argoproj.io/secret-type: repository
stringData:
  type: git
  git_maxdepth: 1

Motivation

When using a large monorepo, checking out the entire repo may be slow, or too resource intensive. Some users have reported crashes from repos being too large. Both git depth, and sparse checkout would greatly improve the monorepo support.

Proposal

Introducing these fields could be done pretty easily in reposerver, basically we can set a default value that would produce the behavior that happens today, with these fields being an override. Neither of the options in this proposal have security implications so no new RBAC rules would be needed. Because it's operated off syncoptions, no new UI components would be needed in application creation. The repo secret might be another story.

mpatters72 commented 2 years ago
# example snippets of argocd application manifests similar to ones I have
metadata:
  name: app-1
  annotations:
    argocd.argoproj.io/manifest-generate-paths: .
    path: "cloud/ns-01/cluster-irl1"
    directory:
      include: '{prometheus,push,vault}*.yaml'
---
metadata:
  name: app-2
  annotations:
    argocd.argoproj.io/manifest-generate-paths: .
    path: "cloud/ns-02/cluster-jpn3"
    directory:
      include: '{prometheus,push,vault}*.yaml'
todaywasawesome commented 2 years ago
  • I think this is exactly what I need. Ability to have Repo-Server only keep track of a single branch, max-depth 1 of folders that matter for applications.

  • In my scenario I have a single git repo with many argo Applications on the same branch. They have a common base path /cloud and yaml manifests are in sub-directories.

  • If the argo-repo-server is doing a "sparse" checkout .. in the example app definitions I have below would "app-1" sparse checkout of folder cloud/ns-01/cluster-irl1 erase the content of cloud/ns-02/cluster-jpn3 in the cache on the repo server in the /tmp/git@mygitserver_myor_myrepo/cloud?

  • If they are in some way erasing each others content.. I suppose what I want is a sparse git sync of /cloud

# example snippets of argocd application manifests similar to ones I have
metadata:
  name: app-1
  annotations:
    argocd.argoproj.io/manifest-generate-paths: .
    path: "cloud/ns-01/cluster-irl1"
    directory:
      include: '{prometheus,push,vault}*.yaml'
---
metadata:
  name: app-2
  annotations:
    argocd.argoproj.io/manifest-generate-paths: .
    path: "cloud/ns-02/cluster-jpn3"
    directory:
      include: '{prometheus,push,vault}*.yaml'

So I think the method might be to just do a folder checkout. That preserves paths and won't interfere with other caches. A true sparse might create issues. I need to try it out.

sidewinder12s commented 1 year ago

This would be hugely beneficial to us as well. Even a global setting of only ever checkout this dir would work for us as most of our ArgoCD manifests are located in 1 or 2 directories in our monorepo.

mustafadagher commented 1 year ago

any updates regarding this? will it be taken into consideration anytime soon? Thanks!

mrwanny commented 1 year ago

This feature will be super useful!

Waterstraal commented 1 year ago

I agree that this would be very much needed for large mono repo setups. Is this feature planned?

crenshaw-dev commented 1 year ago

@yordis started the work but I think hit some walls. Would anyone be up for collaborating with them, or picking up the PR?

yordis commented 1 year ago

@crenshaw-dev, little by little, we are getting there! So, I do not need to pick it up since I am working on it!

I was waiting for you to return from vacation because I got lost in messages passing between different components and proto buffers mapping without honestly comprehending the data flow. I would appreciate 10 minutes of your time to comprehend the situation better and close my gaps.

PLease hit me up after tomorrow 🙏🏻

hannesg commented 1 year ago

@yordis, do you need help? I would benefit a lot from this feature and could contribute a day or three.

crenshaw-dev commented 1 year ago

@yordis and I are gonna set up a call soon. @hannesg if you'd like to join, hit me up on CNCF Slack!

Waterstraal commented 9 months ago

Kindly asking for an update on this issue since it's been quiet for a few months.

I see there are 2 PRs open that address this issue:

Any update or ETA would be appreciated :)

Thanks!

yordis commented 9 months ago

I am committed to continuing the work for the sparse checkout, but I am waiting to get the depth flag to the finish line, neither by rejecting it (which I am hoping doesn't happen) nor changing whatever needs to be changed to be merged.

joshiparth1000 commented 7 months ago

Any updates on this? Really need this in our setup. We got multiple mono repos which are used by multiple clusters. We got a folder per cluster and love to have sparse checkout so we can limit the updates to a particular cluster. This also helps us because we use notifications downstream to trigger tests on recently deployed changes.

crenshaw-dev commented 1 month ago

Conversations about these options have gotten a bit scattered. I'm going to consolidate them here, since this has a lot of thumbs-up.

Here's the challenge:

Sparse and shallow git repos are common requests for monorepos. People are accustomed to using these features in CI pipelines. But these settings are much easier to get right in CI pipelines because you throw away the clone when you're done with it. Argo CD maintains a persistent clone on the repo-server, allowing concurrent access to the same clone until the repo-server restarts. When managing a persistent clone, you have to handle cases which would be safe to ignore for a throw-away clone.

For example:

1) What happens if two applications with different depths/paths access the same repo at the same time? 1) If I change one of these settings, when does it reflect? Immediately? On next checkout? 1) Is storage efficiency impacted? i.e. is the size of my clone going to blow up over time? 1) Is CPU use impacted? Will I see CPU spikes due to cleanup processes? 1) Is there a need for manual cleanups? When do you call them? Are there concurrency concerns when calling them?

The concurrency concerns are different depending on whether you configure depth/paths at the app level or the repo level. If at the repo level, you're less likely to encounter races, but it's still possible.

If someone's up for tackling those problems / answering those questions, we can push forward. But they're nontrivial problems to solve.