Open todaywasawesome opened 2 years ago
I think this is exactly what I need. Ability to have Repo-Server only keep track of a single branch, max-depth 1 of folders that matter for applications.
In my scenario I have a single git repo with many argo Applications on the same branch. They have a common base path /cloud
and yaml manifests are in sub-directories.
If the argo-repo-server is doing a "sparse" checkout .. in the example app definitions I have below would "app-1" sparse checkout of folder cloud/ns-01/cluster-irl1
erase the content of cloud/ns-02/cluster-jpn3
in the cache on the repo server in the /tmp/git@mygitserver_myor_myrepo/cloud
?
If they are in some way erasing each others content.. I suppose what I want is a sparse git sync of /cloud
# example snippets of argocd application manifests similar to ones I have
metadata:
name: app-1
annotations:
argocd.argoproj.io/manifest-generate-paths: .
path: "cloud/ns-01/cluster-irl1"
directory:
include: '{prometheus,push,vault}*.yaml'
---
metadata:
name: app-2
annotations:
argocd.argoproj.io/manifest-generate-paths: .
path: "cloud/ns-02/cluster-jpn3"
directory:
include: '{prometheus,push,vault}*.yaml'
I think this is exactly what I need. Ability to have Repo-Server only keep track of a single branch, max-depth 1 of folders that matter for applications.
In my scenario I have a single git repo with many argo Applications on the same branch. They have a common base path
/cloud
and yaml manifests are in sub-directories.If the argo-repo-server is doing a "sparse" checkout .. in the example app definitions I have below would "app-1" sparse checkout of folder
cloud/ns-01/cluster-irl1
erase the content ofcloud/ns-02/cluster-jpn3
in the cache on the repo server in the/tmp/git@mygitserver_myor_myrepo/cloud
?If they are in some way erasing each others content.. I suppose what I want is a sparse git sync of
/cloud
# example snippets of argocd application manifests similar to ones I have metadata: name: app-1 annotations: argocd.argoproj.io/manifest-generate-paths: . path: "cloud/ns-01/cluster-irl1" directory: include: '{prometheus,push,vault}*.yaml' --- metadata: name: app-2 annotations: argocd.argoproj.io/manifest-generate-paths: . path: "cloud/ns-02/cluster-jpn3" directory: include: '{prometheus,push,vault}*.yaml'
So I think the method might be to just do a folder checkout. That preserves paths and won't interfere with other caches. A true sparse might create issues. I need to try it out.
This would be hugely beneficial to us as well. Even a global setting of only ever checkout this dir would work for us as most of our ArgoCD manifests are located in 1 or 2 directories in our monorepo.
any updates regarding this? will it be taken into consideration anytime soon? Thanks!
This feature will be super useful!
I agree that this would be very much needed for large mono repo setups. Is this feature planned?
@yordis started the work but I think hit some walls. Would anyone be up for collaborating with them, or picking up the PR?
@crenshaw-dev, little by little, we are getting there! So, I do not need to pick it up since I am working on it!
I was waiting for you to return from vacation because I got lost in messages passing between different components and proto buffers mapping without honestly comprehending the data flow. I would appreciate 10 minutes of your time to comprehend the situation better and close my gaps.
PLease hit me up after tomorrow 🙏🏻
@yordis, do you need help? I would benefit a lot from this feature and could contribute a day or three.
@yordis and I are gonna set up a call soon. @hannesg if you'd like to join, hit me up on CNCF Slack!
Kindly asking for an update on this issue since it's been quiet for a few months.
I see there are 2 PRs open that address this issue:
Any update or ETA would be appreciated :)
Thanks!
I am committed to continuing the work for the sparse checkout, but I am waiting to get the depth flag
to the finish line, neither by rejecting it (which I am hoping doesn't happen) nor changing whatever needs to be changed to be merged.
Any updates on this? Really need this in our setup. We got multiple mono repos which are used by multiple clusters. We got a folder per cluster and love to have sparse checkout so we can limit the updates to a particular cluster. This also helps us because we use notifications downstream to trigger tests on recently deployed changes.
Conversations about these options have gotten a bit scattered. I'm going to consolidate them here, since this has a lot of thumbs-up.
Here's the challenge:
Sparse and shallow git repos are common requests for monorepos. People are accustomed to using these features in CI pipelines. But these settings are much easier to get right in CI pipelines because you throw away the clone when you're done with it. Argo CD maintains a persistent clone on the repo-server, allowing concurrent access to the same clone until the repo-server restarts. When managing a persistent clone, you have to handle cases which would be safe to ignore for a throw-away clone.
For example:
1) What happens if two applications with different depths/paths access the same repo at the same time? 1) If I change one of these settings, when does it reflect? Immediately? On next checkout? 1) Is storage efficiency impacted? i.e. is the size of my clone going to blow up over time? 1) Is CPU use impacted? Will I see CPU spikes due to cleanup processes? 1) Is there a need for manual cleanups? When do you call them? Are there concurrency concerns when calling them?
The concurrency concerns are different depending on whether you configure depth/paths at the app level or the repo level. If at the repo level, you're less likely to encounter races, but it's still possible.
If someone's up for tackling those problems / answering those questions, we can push forward. But they're nontrivial problems to solve.
Summary
There are many situations where having more control over how git operates is very valuable.
Set checkout directory
This would checkout only the subfolder specified.
This could be the equivalent of
git checkout <remote>/<branch> -- relative/path/to/file/or/dir
or a true sparse checkout likegit config core.sparseCheckout true
if doing the latter it would need to be a setting on the repository.Set checkout depth
Rather than pulling the entire history (default) this would allow specifying the depth to collect.
Setting git options on repos
For something like depth, a
maxdepth
would make more sense to restrict applications using that repository.Motivation
When using a large monorepo, checking out the entire repo may be slow, or too resource intensive. Some users have reported crashes from repos being too large. Both git depth, and sparse checkout would greatly improve the monorepo support.
Proposal
Introducing these fields could be done pretty easily in reposerver, basically we can set a default value that would produce the behavior that happens today, with these fields being an override. Neither of the options in this proposal have security implications so no new RBAC rules would be needed. Because it's operated off syncoptions, no new UI components would be needed in application creation. The repo secret might be another story.