kadamanas93 commented 2 years ago

Summary

In the High Availability document (https://argo-cd.readthedocs.io/en/stable/operator-manual/high_availability/), it mentions that argocd-repo-server uses /tmp directory store git repository. If we encounter an issue with disk space, then use a persistent volume. This usage of persistent volume in a deployment prevents the deployment from being scaled up.

Motivation

Our git repository is a monorepo and contains 10 years of history. As such the git fetch process takes more than 5mins sometimes. I have used ARGOCD_EXEC_TIMEOUT to increase this timeout and the pod is able to retrieve the repo most of the time. However, the pod eventually exceeds the emptyDir volume that was mounted on /tmp and k8s evicts the pod. This triggers another download of the git repo and the cycle continues. The way I have workaround this problem is by adding a PVC to the deployment and mounting it on /tmp. The size is big enough to hold all the data and it does survive a restart from the pod. But a new issue arises where I can't scale up the reposerver and have to function with a single pod.

Proposal

Provide manifests where the reposerver can be deployed using a StatefulSet. An ephemeral volume is not a good solution as the contents do get deleted so the repo will need to get re-fetched.

diranged commented 2 years ago

@kadamanas93 I agree with this feature request... we have the same basic problem that you do. Would the maintainers accept a PR to optionally use a StatefulSet for the Repo-Server I wonder?

pierluigilenoci commented 2 years ago

@mkilchhofer do you accept PR from external people?

mkilchhofer commented 2 years ago

For sure we accept PRs from external prople in the argo-helm repository.

But our (argo-helm) intent is to be inline with the upstream manifests in this repo here. Eg. convertig repo-server to a StatefulSet in argo-helm only would not comply with our intent stated in the chart README:

The default installation is intended to be similar to the provided Argo CD releases.

pierluigilenoci commented 2 years ago

@mkilchhofer what you say makes sense but so how can we fix it? Is there any way to go?

Many charts (for example Prometheus, see below) allow you to choose whether to use Deployment or Stateful set. By adding this option one can stay aligned (using Deployment as default) giving the choice. Could it be an option?

Ref: https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/templates/server/deploy.yaml#L2

and https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/templates/server/sts.yaml#L2

crenshaw-dev commented 2 years ago

@kadamanas93 I'm not opposed to the PVC route, but I'd love to make that option unnecessary for you if possible.

As such the git fetch process takes more than 5mins sometimes.

I'm not a git expert, but I think there are ways to decrease the size of a large repo, e.g. repacking. I believe git fetch also has options to only fetch the N most recent commits. Would something like that be helpful for your use case?

diranged commented 2 years ago

@crenshaw-dev I'm sure it would help .. but our mono-repo is a ~10-15GB checkout and gets thousands of commits per day. I really think that from an efficiency standpoint, there is a point at which the repo-server should be a statefulset with a persistent volume behind it. Not all orgs will need this of course... but some will.

crenshaw-dev commented 2 years ago

What version of Argo CD are you running? Recent versions should be fetching a relatively small amount of data for each commit. If those small fetches are adding up to something huge over time, we could consider a gc or repack cron to mitigate the issue.

But yeah @diranged I agree a PVC does make sense at some point. It's complicated by the fact that repo paths are randomized to protect against directory traversal. We'd have to persist a repo-to-directory map in redis or something to facilitate recovery after a restart.

diranged commented 2 years ago

What version of Argo CD are you running? Recent versions should be fetching a relatively small amount of data for each commit. If those small fetches are adding up to something huge over time, we could consider a gc or repack cron to mitigate the issue.

But yeah @diranged I agree a PVC does make sense at some point. It's complicated by the fact that repo paths are randomized to protect against directory traversal. We'd have to persist a repo-to-directory map in redis or something to facilitate recovery after a restart.

{
    "Version": "v2.3.3+07ac038",
    "BuildDate": "2022-03-30T00:06:18Z",
    "GitCommit": "07ac038a8f97a93b401e824550f0505400a8c84e",
    "GitTreeState": "clean",
    "GoVersion": "go1.17.6",
    "Compiler": "gc",
    "Platform": "linux/amd64",
    "KsonnetVersion": "v0.13.1",
    "KustomizeVersion": "v4.4.1 2021-11-11T23:36:27Z",
    "HelmVersion": "v3.8.0+gd141386",
    "KubectlVersion": "v0.23.1",
    "JsonnetVersion": "v0.18.0"
}

crenshaw-dev commented 2 years ago

Yeah, that's got the patch which is supposed to keep repo size down. But an occasional gc could help. How much bigger than the source repo is that directory getting on disk?

diranged commented 2 years ago

Yeah, that's got the patch which is supposed to keep repo size down. But an occasional gc could help. How much bigger than the source repo is that directory getting on disk?

Hard to say .. it's really hard to get into that container. The only real information I've got is that we have had to significantly increase our checkout timeouts beacuse we were hitting context exceeded alarms. Is there a reasonable way for me to exec into the container and get you any useful data? I know it's been locked down recently (good).

crenshaw-dev commented 2 years ago

we have had to significantly increase our checkout timeouts beacuse we were hitting context exceeded alarms

Yeah that's the super weird part. It shouldn't be pulling that much data for just a few commits (I assume the fetch is happening a fairly high number of times daily).

I know it's been locked down recently (good).

If you have exec k8s RBAC privileges you should be able to get in. Just kubectl get po -n argocd to get the repo-server pod name and then kubectl exec -it -- bash. Once you're in, start chmod-ing stuff to grant rx on directories in /tmp. If there are a bunch of directories, it may take a while to find the randomly-named dir that contains the right repo.

Once you've found the right one, du -sh should give a nice-to-read directory size.

I wouldn't worry too much about setting permissions back. The repo-server locks down permissions after every manifest refresh. If you're super worried, restart the Deployment so it starts fresh.

diranged commented 2 years ago

:facepalm:

I had gotten this far and then gave up thinking that I wouldn't have the permissions to go further:

$ k exec -ti argocd-repo-server-b6d5ff99d-9592m -- bash
Defaulted container "argocd-repo-server" out of: argocd-repo-server, copyutil (init)
argocd@argocd-repo-server-b6d5ff99d-9592m:~$ cd /tmp
argocd@argocd-repo-server-b6d5ff99d-9592m:/tmp$ ls
_argocd-repo  reposerver-ask-pass.sock
argocd@argocd-repo-server-b6d5ff99d-9592m:/tmp$ ls _argocd-repo/
ls: cannot open directory '_argocd-repo/': Permission denied

Here's our big ass repo:

argocd@argocd-service-repo-server-64d7587bd8-82ml9:/tmp/_argocd-repo$ du -sch a9588946-38c5-48db-bda9-e8aa0fee4b92
8.0G    a9588946-38c5-48db-bda9-e8aa0fee4b92
8.0G    total

crenshaw-dev commented 2 years ago

@diranged yeah, we don't want to allow root on the repo-server, but that keeps us from using user/group access to limit repo filesystem permissions. So if an attacker can get chmod access, they can undo the protection. But it's better than nothing. :-)

Sweet, so the repo isn't especially large on-disk. I was worried it was gonna be 50GB or something due to all the fetches.

This makes me wonder why each fetch is so large.

Could you pop into that repo and run GIT_TRACE=1 git fetch origin --tags --force? It should print the transfer size.

diranged commented 2 years ago

GIT_TRACE=1 git fetch origin --tags --force

Unfortunately we use the Github App authentication model, which isn't really supported from within the container with git... that said, I ran it from my own laptop. I've stripped out some of our own information here, but I am not sure what you're looking for. Are you looking for the "average amount of change" between each argocd git pull?

06:41:46.939377 git.c:455               trace: built-in: git config --get-all hub.host
06:41:46.947786 exec-cmd.c:139          trace: resolved executable path from Darwin stack: /Library/Developer/CommandLineTools/usr/bin/git
06:41:46.948010 exec-cmd.c:238          trace: resolved executable dir: /Library/Developer/CommandLineTools/usr/bin
06:41:46.948251 git.c:455               trace: built-in: git fetch origin --tags --force
06:41:46.966377 run-command.c:667       trace: run_command: unset GIT_PREFIX; GIT_PROTOCOL=version=2 ssh -o SendEnv=GIT_PROTOCOL org-...@github.com 'git-upload-pack '\''/.../....'\'''
remote: Enumerating objects: 19908, done.
remote: Counting objects: 100% (6985/6985), done.
remote: Compressing objects: 100% (900/900), done.
06:42:00.948757 run-command.c:667       trace: run_command: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 10738 on MacBook-Air....' --pack_header=2,19908
06:42:00.961928 exec-cmd.c:139          trace: resolved executable path from Darwin stack: /Library/Developer/CommandLineTools/usr/libexec/git-core/git
06:42:00.963214 exec-cmd.c:238          trace: resolved executable dir: /Library/Developer/CommandLineTools/usr/libexec/git-core
06:42:00.964391 git.c:455               trace: built-in: git index-pack --stdin -v --fix-thin '--keep=fetch-pack 10738 on MacBook-Air....' --pack_header=2,19908
remote: Total 19908 (delta 6386), reused 6457 (delta 6061), pack-reused 12923
Receiving objects: 100% (19908/19908), 55.32 MiB | 6.77 MiB/s, done.
Resolving deltas: 100% (11997/11997), completed with 1276 local objects.
06:42:10.568856 run-command.c:667       trace: run_command: git rev-list --objects --stdin --not --all --quiet --alternate-refs
06:42:10.575086 exec-cmd.c:139          trace: resolved executable path from Darwin stack: /Library/Developer/CommandLineTools/usr/libexec/git-core/git
06:42:10.575803 exec-cmd.c:238          trace: resolved executable dir: /Library/Developer/CommandLineTools/usr/libexec/git-core
06:42:10.576647 git.c:455               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs
From git+ssh://github.com/.../...
...
06:42:12.533248 run-command.c:1628      run_processes_parallel: preparing to run up to 1 tasks
06:42:12.534034 run-command.c:1660      run_processes_parallel: done
06:42:12.534631 run-command.c:667       trace: run_command: git maintenance run --auto --no-quiet
06:42:12.537641 exec-cmd.c:139          trace: resolved executable path from Darwin stack: /Library/Developer/CommandLineTools/usr/libexec/git-core/git
06:42:12.537934 exec-cmd.c:238          trace: resolved executable dir: /Library/Developer/CommandLineTools/usr/libexec/git-core
06:42:12.538226 git.c:455               trace: built-in: git maintenance run --auto --no-quiet

crenshaw-dev commented 2 years ago

Are you looking for the "average amount of change" between each argocd git pull?

Exactly! Just trying to figure out why the fetches take so long. I get why the initial one is slow. But then it should be pulling only incremental changes.

crenshaw-dev commented 2 years ago

Receiving objects: 100% (19908/19908), 55.32 MiB | 6.77 MiB/s, done.

It that's typical, we should expect 10s fetches. Still a bit slow, but nowhere near the timeout. I wonder what's going wrong on the repo-server vs. your local machine.

I see that OP is the one who had to increase the timeout. Maybe the symptoms for your mono-repo aren't nearly as severe as what they're facing.

diranged commented 2 years ago

@crenshaw-dev Ok - for us the fetches aren't the end of the world. I definitely can see though as the repo grows how it could become that. For the time being, our core issue is that we do not want the repo-server pod going away and losing its storage because it is "difficult" for it to recover it due to the size. I appreciate the issue you pointed out about how its not just an STS issue, but also there needs to be some mapping/state storage. It would be great if you would consider at least the statefulset/mapping-saving as an issue for improvement.

crenshaw-dev commented 2 years ago

Gotcha, makes sense. At some point we can't optimize the repo-server any more, and it's just time for an STS. Will keep it in mind for future work. Thanks for the help debugging!

kadamanas93 commented 2 years ago

Running the command in git repo on the repo server yields this:

16:04:48.109817 git.c:444               trace: built-in: git fetch origin --tags --force
16:04:48.115427 run-command.c:664       trace: run_command: unset GIT_PREFIX; GIT_PROTOCOL=version=2 ssh -o SendEnv=GIT_PROTOCOL git@github.com 'git-upload-pack '\''*****.git'\'''
Warning: Permanently added the ECDSA host key for IP address '********' to the list of known hosts.
remote: Enumerating objects: 88, done.
remote: Counting objects: 100% (78/78), done.
remote: Compressing objects: 100% (20/20), done.
16:04:49.258010 run-command.c:664       trace: run_command: git unpack-objects --pack_header=2,88
remote: Total 88 (delta 55), reused 71 (delta 55), pack-reused 10
16:04:49.260674 git.c:444               trace: built-in: git unpack-objects --pack_header=2,88
Unpacking objects: 100% (88/88), 25.20 KiB | 108.00 KiB/s, done.
16:04:49.536971 run-command.c:664       trace: run_command: git rev-list --objects --stdin --not --all --quiet --alternate-refs
16:04:49.539423 git.c:444               trace: built-in: git rev-list --objects --stdin --not --all --quiet --alternate-refs
From github.com:**********
...
16:04:49.626349 run-command.c:1625      run_processes_parallel: preparing to run up to 1 tasks
16:04:49.626787 run-command.c:1657      run_processes_parallel: done
16:04:49.628868 run-command.c:664       trace: run_command: git maintenance run --auto --no-quiet
16:04:49.630759 git.c:444               trace: built-in: git maintenance run --auto --no-quiet

This process was really fast. The reason why I had to increase the timeout was really the first repo pull. That sometimes used to exceed 10mins. Just fyi, we are deploying ArgoCD v2.2.4.

kadamanas93 commented 2 years ago

@kadamanas93 I'm not opposed to the PVC route, but I'd love to make that option unnecessary for you if possible.

As such the git fetch process takes more than 5mins sometimes.

I'm not a git expert, but I think there are ways to decrease the size of a large repo, e.g. repacking. I believe git fetch also has options to only fetch the N most recent commits. Would something like that be helpful for your use case?

Would setting git fetch to fetch only N most recent commits apply to the initial git fetch? If so, I think it would be extremely useful.

crenshaw-dev commented 2 years ago

Would setting git fetch to fetch only N most recent commits apply to the initial git fetch? If so, I think it would be extremely useful.

I think that would work. I'm not sure if there are any negative side-effects to getting a partial repo and then fetching additional commits. I wonder if it still stores things efficiently on disk that way. I'd be interested in offering this as an opt-in feature when adding a new repo.

ArgoCD v2.2.4

Aside: I'd bump to 2.2.9 for all the security fixes. :-)

kadamanas93 commented 2 years ago

Would setting git fetch to fetch only N most recent commits apply to the initial git fetch? If so, I think it would be extremely useful.

I think that would work. I'm not sure if there are any negative side-effects to getting a partial repo and then fetching additional commits. I wonder if it still stores things efficiently on disk that way. I'd be interested in offering this as an opt-in feature when adding a new repo.

Is pulling a single commit that wasn't part of this initial git clone be problematic?

I would also suggest like a cron that would pull the git repo in batches (something like pull the last 1000 commits, then pull the 1000 commits before that). Is that something that is possible?

kadamanas93 commented 2 years ago

I would also suggest like a cron that would pull the git repo in batches (something like pull the last 1000 commits, then pull the 1000 commits before that). Is that something that is possible?

Looking into this, there is an option --deepen that can be executed in a loop until the whole repo is fetched. I am not sure what the end condition would be, but at least ArgoCD can start with a partial repo.

crenshaw-dev commented 2 years ago

I guess batch-fetching would allow you to avoid making the timeout super high.

An alternative would be to have a separate timeout for the initial fetch vs subsequent fetches. It wouldn't save disk space, but it would avoid making the timeout unnecessarily high for normal fetches.

kadamanas93 commented 2 years ago

I guess batch-fetching would allow you to avoid making the timeout super high.

An alternative would be to have a separate timeout for the initial fetch vs subsequent fetches. It wouldn't save disk space, but it would avoid making the timeout unnecessarily high for normal fetches.

The issue with the alternative is that the initial fetch has to succeed. Otherwise ArgoCD ends up in a state where all the Applications are in Unknown state. At least with batch fetching, ArgoCD can partially run.

pierluigilenoci commented 2 years ago

@crenshaw-dev @kadamanas93 interesting discussion but I would like to point out that there is also another interesting point that is at the heart of the matter.

The repo server cache also creates problems on node disks. With pod evicted with the message Pod The node had condition: [DiskPressure].

Quoting myself:

I just wish that in the presence of large caches, a situation aggravated also by some bug (https://github.com/argoproj/argo-cd/issues/4002 and https://github.com/helm/helm/issues/10583), the Repo Server pods are evicted or, even worse, other pods are evicted in the same node

Ref: https://github.com/argoproj/argo-helm/issues/438#issuecomment-1139417461

It might make sense to link the CNCF's Slack (stranded) discussion as well: https://cloud-native.slack.com/archives/C020XM04CUW/p1653511733823829

pierluigilenoci commented 2 years ago

@kadamanas93 any update on this?

crenshaw-dev commented 2 years ago

@pierluigilenoci none from me. Glancing back over the thread, it looks like we have a few potential workarounds. The STS change would require preserving some index of the cache.

With either a workaround or a persisted index cache, I don't currently have time to work on it. If you're up to contribute one of these, I'd be happy to help with the dev env and/or review.

pierluigilenoci commented 2 years ago

@crenshaw-dev sorry for the delay in answering.

The workarounds that exist are either poor (only one replica?) or do not solve some problems (ie disk pressure on the node's disk) unless there is one clever enough I find that StatefulSet is the only solution.

crenshaw-dev commented 2 years ago

@pierluigilenoci there have been a number of bugs causing disk pressure, but I think it's preferable to tackle those bugs rather than introducing another potentially-buggy feature. Disk usage should be fairly flat after repos are initially configured. If it's not, we should work to tune cache settings and potentially offer features to run git gc or configure fetches to be lighter-weight.

crenshaw-dev commented 2 years ago

Again, I'm not completely opposed to PVC. I just haven't seen a use case yet that I think is impossible to solve with less-drastic changes.

blakepettersson commented 1 year ago

This seems like something which would be better addressed by adding support for sparse checkout / partial clones IMO (see #11198)

mkilchhofer commented 1 year ago

Maybe something for sig-scalability?

RobinsonZ commented 5 months ago

With either a workaround or a persisted index cache, I don't currently have time to work on it. If you're up to contribute one of these, I'd be happy to help with the dev env and/or review.

What would be needed to make this happen? I may be able to devote some developer time to adding this.

jpbelangerupgrade commented 3 weeks ago

What if we approach this with a different angle? Would adding a "pre-fetch" before the pod because live resolve most of our concerns (including mine).

I was looking at the code and trying to see if pre-cloning the repo (on a multi-read volume or host path) and using git alternates would work. But there's no real way to easily hook that up, and go-git also has a lot of issue with alternates (from trying to use it in another project). Also using PVC didn't seams like transparent has repo-server uses a "temp" folder name on every initialization. So it wouldn't reuse the previous clone.

I don't mind having a pod take 5mins to boot. What I'm having issue with is when the pod boots, it become available very quickly and then starts piling up a bunch of request and takes 15mins to fetch correctly and during that time the application controller is pretty much dead waiting on the repo-server.

We tried adding a startup delay, but that did nothing since the fetch is done with the first request accepted by the new pod.

jpbelangerupgrade commented 1 day ago

I'll add our solution here if that can help anyone else with the same issue we had with scalability and impact of repo-server restarts. Restarting repo-server (3 replicas) impacted and froze our argo-cd for a good 5-15 minutes in most cases. Since repo-server started serving request and they all get stuck waiting on the repo clone which on our side is a good 2.5Gi in size.

After this change, init container takes 1 to 2mins to clone and then allows the repo-server to start and use the pre-cloned repo with zero performance impact on reconciliation and syncs.

Init container

We added a new initContainers to repo-server ```yaml initContainers: - name: clone image: command: [ "/bin/bash", "-c", "--" ] securityContext: runAsUser: 999 #argocd user that the argocd main container uses args: [ "/etc/scripts/github-clone.sh /etc/secret/githubAppPrivateKey.pem $GIT_APP_ID $GIT_INSTALLATION_ID" ] volumeMounts: - name: repo-secret mountPath: "/etc/secret/githubAppPrivateKey.pem" subPath: "githubAppPrivateKey" readOnly: true - name: argocd-cm mountPath: "/etc/scripts/github-clone.sh" subPath: "custom.upgrade.github-clone.sh" readOnly: true - name: tmp mountPath: /tmp env: - name: GIT_APP_ID valueFrom: secretKeyRef: name: repo-secret key: githubAppID - name: GIT_INSTALLATION_ID valueFrom: secretKeyRef: name: repo-secret key: githubAppInstallationID volumes: - name: repo-secret secret: secretName: repo-secret - name: argocd-cm configMap: name: argocd-cm defaultMode: 0777 ```

Bash script to generate github app token

We added it to the argo-cd cm config for simplicity. Simply replace your `repoName` and `repoURL` in the below script ```sh custom.upgrade.github-clone.sh: | #!/usr/bin/env bash # # signs a JWT assertion for the given github app using the # provided private key. the expiration time is set to the maximum # allowed value, i.e. 10 minutes in the future. # set -e if [[ $# -ne 3 ]] || [[ ! -r "$1" ]]; then if [[ ! -r "$1" ]]; then >&2 echo "Error: $1 is not readable" fi >&2 echo "Usage: $0 PRIVATE_KEY APP_ID INSTALLATION_ID" exit 1 fi private_key_file="$1" app_id=$2 current_time=$(date +%s) # the maxiumum expiration time is 10 minutes, but we set it to 9 minutes # to avoid clock skew differences between us and GitHub (which would cause GitHub to reject the token, # because the expiration time is set too far in the future). exp_time=$(($current_time + 9 * 60)) header='{ "type":"JWT", "alg":"RS256" }' payload='{ "iat":'$current_time', "exp":'$exp_time', "iss":'$app_id' }' compact_json() { jq -c '.' | tr -d '\n' } base64url_encode() { openssl enc -base64 -A | tr '+/' '-_' | tr -d '=' } encoded_header=$(echo $header | compact_json | base64url_encode) encoded_payload=$(echo $payload | compact_json | base64url_encode) encoded_body="$encoded_header.$encoded_payload" signature=$(echo -n $encoded_body | openssl dgst -binary -sha256 -sign "$private_key_file" | base64url_encode) APP_TOKEN="$encoded_body.$signature" INSTALLATION_ID=$3 # authenticate as github app and get access token INSTALLATION_TOKEN_RESPONSE=$(curl -s -X POST \ -H "Authorization: Bearer ${APP_TOKEN}" \ -H "Accept: application/vnd.github.machine-man-preview+json" \ https://api.github.com/app/installations/$INSTALLATION_ID/access_tokens) INSTALLATION_TOKEN=$(echo $INSTALLATION_TOKEN_RESPONSE | jq -r '.token') if [ -z "$INSTALLATION_TOKEN" ]; then >&2 echo "Unable to obtain installation token" >&2 echo "$INSTALLATION_TOKEN_RESPONSE" exit 1 fi time git clone https://x-access-token:$INSTALLATION_TOKEN@github.com/credify/k8s-manifests.git /tmp/_argocd-repo/k8s-manifests # Removing credentials from the git remote url so that it matches with what argo-cd repo-server expects cd /tmp/_argocd-repo/ git remote set-url origin ```

jbartyze-rh commented 1 day ago

Hello everyone,

Wantes to share another pattern that really helped in my current engagement.

Instead of using PVC or emptyDir on filesystem. We have modified the repo-server spec to consume emptyDir in memory in folder /cache and moved the cache storage to that folder with TMPDIR variable. Disk will always be slower than memory.

In our case saw massive decrease in IOPS generated by repo-server and whole Argocd monorepo setup got massively faster when managing around 700+ apps across multiple target clusters.

In our use case we have to often(every 20 minutes) recycle the cache on repo-server because of AVP Vault integration.

We have added a lot other tunnings on top of this change, but that came with the biggest performance impact.

argoproj / argo-cd

Deploy repo-server as StatefulSet instead of Deployment #7927

Summary

Motivation

Proposal