argoproj / applicationset

The ApplicationSet controller manages multiple Argo CD Applications as a single ApplicationSet unit, supporting deployments to large numbers of clusters, deployments of large monorepos, and enabling secure Application self-service.
https://argocd-applicationset.readthedocs.io/
Apache License 2.0
586 stars 279 forks source link

git fetch errors in argocd-applicationset-controller pod on OpenShift 3.11 #202

Closed alexanderdalloz closed 2 years ago

alexanderdalloz commented 3 years ago

While the ApplicationSet Controller v0.1.0 works on OpenShift 4.x, it fails on the older platform generation 3.11.

2021-04-16T11:21:03.742Z ERROR controller-runtime.manager.controller.applicationset Reconciler error {"reconciler group": "argoproj.io", "reconciler kind": "ApplicationSet", "name": "dynatrace", "namespace": "argocd", "error": "Error during fetching repo:git fetch origin --tags --forcefailed exit status 128: No user exists for uid 1012480000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.", "errorVerbose": "git fetch origin --tags --forcefailed exit status 128: No user exists for uid 1012480000\r\nfatal: Could not read from remote repository.\n\nPlease make sure you have the correct access rights\nand the repository exists.\nError during fetching repo\ngithub.com/argoproj-labs/applicationset/pkg/services.checkoutRepo\n\t/workspace/pkg/services/repo_service.go:159

This is due to the fact that on OpenShift 3.11 the arbitrary UID used in the container does not get automatically added to /etc/passwd.

$ oc exec -ti argocd-applicationset-controller-5d5589c46f-2zsw8 -- bash
groups: cannot find name for group ID 1012480000
I have no name!@argocd-applicationset-controller-5d5589c46f-2zsw8:/$ whoami
whoami: cannot find name for user ID 1012480000
I have no name!@argocd-applicationset-controller-5d5589c46f-2zsw8:/$ tail -n5 /etc/passwd
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin
systemd-network:x:101:102:systemd Network Management,,,:/run/systemd:/usr/sbin/nologin
systemd-resolve:x:102:103:systemd Resolver,,,:/run/systemd:/usr/sbin/nologin
systemd-timesync:x:103:104:systemd Time Synchronization,,,:/run/systemd:/usr/sbin/nologin
messagebus:x:104:105::/nonexistent:/usr/sbin/nologin
I have no name!@argocd-applicationset-controller-5d5589c46f-2zsw8:/$

This his different on OpenShift 4.x. See https://www.openshift.com/blog/a-guide-to-openshift-and-uids

By default, OpenShift 4.x appends the effective UID into /etc/passwd of the Container during the creation of the Pod.
    Note: This was a manual step when deploying applications to OCP 3.x, that required the UID to exist in the passwd file of the Container.

Thus it is necessary to extend the argocd-applicationset image with the same functionality implemented in to argocd image. This is especially the uid_entrypoint.sh script under /usr/local/bin being called as the command for the container.

As a proof of concept I have build a custom image based on the original one which fixes the issue on OpenShift 3.11.

$ cat Dockerfile
FROM quay.io/argocdapplicationset/argocd-applicationset:v0.1.0

USER root

ENV DEBIAN_FRONTEND=noninteractive

RUN groupadd -g 999 argocd && \
    useradd -r -u 999 -g argocd argocd && \
    mkdir -p /home/argocd && \
    chown argocd:0 /home/argocd && \
    chmod g=u /home/argocd && \
    chmod g=u /etc/passwd && \
    apt-get update && \
    apt-get dist-upgrade -y && \
    apt-get install --no-install-recommends -y tini && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

COPY uid_entrypoint.sh /usr/local/bin/uid_entrypoint.sh

USER 999

WORKDIR /home/argocd
alexanderdalloz commented 3 years ago

@jgwest Whom are you requesting for help? Anything I can do to help you find a valid solution? Can this issue be escalated within the Argo project?

sbose78 commented 3 years ago

Hey @alexanderdalloz , thank you for providing a snippet of the Dockerfile, we are validating it!

jgwest commented 3 years ago

I've got good news and bad news, for running ApplicationSet controller on OpenShift 3.11:

Good news: I was able to get access to an OpenShift v3.11 cluster, and was able to confirm that the Dockerfile does work. But, I hit other issues...

Bad news: Argo CD v2.0.x and ApplicationSet v0.1.0 both no longer support the apiextensions.k8s.io/v1beta1 version of the CustomResourceDefinition resource, which is unfortunately the only version of the CRD resource that OpenShift 3.11 supports.

This is something of a showstopper, as it is difficult to support OpenShift v3.11 while also supporting Kubernetes v1.22, which has fully removed this API version (eg you must use the new version). My guess is that many other projects will be hitting this support barrier as well, and will be moving away fairly quickly with their new release (if they haven't already). :frowning:

alexanderdalloz commented 3 years ago

@jgwest, many thanks for noticing the API deprecation. I do understand that this is in fact a serious reason why OpenShift 3.11 (with k8s 1.11) is no longer supported by current Argo CD and ApplicationSet.