argoproj / argo-rollouts

Progressive Delivery for Kubernetes
https://argo-rollouts.readthedocs.io/
Apache License 2.0
2.68k stars 839 forks source link

Analysis template running more than one time, expected only once during blue green deployment strategy #2519

Open jonathanribas opened 1 year ago

jonathanribas commented 1 year ago

Hi,

Application: Adobe Commerce Infrastructure run on Kubernetes with HPA ArgoCD: 2.5.6 ArgoCD Rollout version 1.4.0

What do we want to do?

We would like to deploy a new version of our Adobe Commerce application using blue / green strategy.

Why?

Adobe Commerce needs to run a mandatory command that enables / disable modules, update data configuration on new application version.

Expected result

This command must run only one time on new application version before this one can be ready to receive traffic. If this command is successful, new application version can be promoted and receive traffic, old version should not receive any traffic anymore.

Actual result

If this job fails to run, new application version should be killed and we must stay on old version. We need to launch this Adobe Commerce command again on old version to make things work again correctly. Otherwise we stay in a bastard version and application doesn't work as expected.

Main issue we would like to fix

We have noticed that sometimes maybe when our application scales up, Analysis template runs this Adobe Commerce command a second time which creates huge issues on our application. That's why we want it to run only one time on new version that don't receive any traffic yet. Basically we want to run Analysis template a single time only. As we are not expert on ArgoCD rollouts we have tried to create a lock file on a shared folder to make sure it doesn't run a second time but unfortunately it's not working as expected.

Sorry Yaml is a bit broken but you have how we do it today:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: setup-upgrade
  annotations:
    "helm.sh/hook": pre-install,pre-upgrade
    "helm.sh/hook-weight": "0"
    "helm.sh/resource-policy": keep
spec:
  metrics:
  - name: setup-upgrade
    provider:
      job:
        spec:
          backoffLimit: 1
          template:
            spec:
              affinity:
                nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
                    nodeSelectorTerms:
                      - matchExpressions:
                        - key: type
                          operator: In
                          values:
                            - {{ if .Values.cronjobs.dedicatedNode }}worker{{ else }}prod{{ end}}
              tolerations:
                - key: env
                  operator: Equal
                  value: prod
                  effect: NoSchedule
                - key: env
                  operator: Equal
                  value: prod
                  effect: NoExecute
                command: ["/bin/bash"]
                args:
                - -c
                - |
                  set -e -o errexit -o nounset
                  cd /var/www/html

                   if [ ! -f ./var/report/analysisrun-in-progress.lock ]; then
                    touch ./var/report/analysisrun-in-progress.lock
                    php -d memory_limit=4096M bin/magento setup:upgrade --no-interaction --keep-generated
                    unlink ./var/report/analysisrun-in-progress.lock
else
                    echo "A lock file exists indicating that an AnalysisRun is already in progress."
                    exit 1
                  fi
                volumeMounts:
                - mountPath: /var/www/html/var/report
                  name: caudalie-report-volume
              volumes:
              - name: caudalie-report-volume
                persistentVolumeClaim:
                  claimName: caudalie-report-volume-claim
                persistentVolumeClaim
              restartPolicy: Never

Thanks in advance for your precious help!!!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity.

kostis-codefresh commented 1 month ago

Hello. It seems to me that you are abusing the metric analysis system as a way to run pre-sync hooks.

If you are already using Argo CD, wouldn't it be better to use a "real" pre-sync hook instead? (i.e. outside of Argo Rollouts)