epics-containers / p45-deployment

Argo cd apps for deploying bl45p beamline services
Apache License 2.0
0 stars 0 forks source link

Argo CD support needs some thought:- ec_service, enabled and removed flags are confusing and don't quite work #11

Open gilesknap opened 1 month ago

gilesknap commented 1 month ago

We have a number of issues with the current state of our deployment repo.

I propose the following for review:-

We would still use the argo cd cli for the rest of the commands - just add the git repo for the list of stopped services.

BUT: now ec has access to that repo I see no reason not to add write capability to that repo so that we can restore the deploy and delete functions.

@coretl @marcelldls @DiamondJoseph @ZohebShaikh Please have a look at this and let me know your thoughts. Thanks.

gilesknap commented 1 month ago

@marcelldls regarding the appearance and override of enabled flag in the root app. I believe I had it working at some point as described in the 1st 3 'proposed' bullets above so happy to try and reproduce that.

DiamondJoseph commented 1 month ago

I think DAQ's use case is to only make changes to ixx-deployment when adding a new service. Our services should always point to ixx-services main, and that repository alone can represent the state of the services. Each of our charts will have an enabled in their values.yaml and we manage it there. Just trying to have as little bespokeness as possible.

marcelldls commented 1 month ago

@gilesknap Your proposal is that any services that use our repos must implement a global.enabled?

gilesknap commented 1 month ago

@gilesknap Your proposal is that any services that use our repos must implement a global.enabled?

No. The proposal is the enabled is understood by apps/template to mean don't render that app into the sub-apps manifest, so argocd will remove it. However because ec looks at the repo it can see and show the stopped service.

gilesknap commented 1 month ago

So this means any service can be stopped (removed from the cluster) by setting enabled=false and control of stopping / starting services is entirely controlled centrally. @DiamondJoseph if you want the deployment repo to represent the state of the cluster why not let it do the starting and stopping - your approach is only needed if you really want to have the other resources deployed but scale to zero, I believe.

marcelldls commented 1 month ago

@gilesknap Your proposal is that any services that use our repos must implement a global.enabled?

No. The proposal is the enabled is understood by apps/template to mean don't render that app into the sub-apps manifest, so argocd will remove it. However because ec looks at the repo it can see and show the stopped service.

I see. This is already what removed does? Your proposal is to get rid of enabled, ec-service? In short, we drop scaling?

gilesknap commented 1 month ago

Yes we drop scaling as its a chart specific feature and I'd like to support any chart.

marcelldls commented 1 month ago

Yes we drop scaling as its a chart specific feature and I'd like to support any chart.

Im happy with the proposed change - the iteration before what we have now the replica count would be set by Override root app > Override app > app values > app chart default > service values > root service values > service chart default which was even more "scattered" then now.

The impact is pvcs must be outside of your service chart? And start up performance?

Can we revert the global in ec-services helm chart for a fixed replica count?

coretl commented 1 month ago
  • we make the ec tool aware of the git repo p47-deployment and have it parse the values.yaml

Does it need to know about git? Couldn't it get this info from the argo cli? If it knew from argo cli what the enabled value is, and if it is overridden, then this would tell it the difference between:

gilesknap commented 1 month ago

This seems like a fine idea. But.

@marcelldls you know the CLI better than I, can you get a parameter value?

re the second point. I'm struggling with the sense that ec is crippled and the workflow is more laborious when you switch to argocd. Particularly noticeable when I'm intending to write a helm based tutorial first and then layer an argo cd one on top. It feels like you have hit the sweet spot with helm and then gone too far with argo cd.

If we give ec / argo cd parity of function with ec / helm, then:

coretl commented 1 month ago

re the second point. I'm struggling with the sense that ec is crippled and the workflow is more laborious when you switch to argocd. Particularly noticeable when I'm intending to write a helm based tutorial first and then layer an argo cd one on top. It feels like you have hit the sweet spot with helm and then gone too far with argo cd.

Good point, we like the synergy with configure-ioc, which would mean needing ec to be able to write to the p47-deployment git repo to upgrade/downgrade versions from the commanline