Open lburgazzoli opened 7 months ago
More than a bug I think it's the way it is actually designed. I'd mark this issue as an enhancement instead. In any case I'm not sure how we should behave as most of the traits configuration changes must trigger a reconciliation cycle. Maybe we should mark those traits which should not be taken in consideration when calculating the digest?
Triggering a reconciliation is fine, but that should not lead to a disruptive change in any resource so maybe we need two different digests or rethink how the reconciliation is re-triggered
Then, maybe this is not affecting the Pod execution at all. Do you have any failing example, ie, when adding or removing a service without affecting anything else?
If you have an integration like this one (flow omitted because it does not matter much):
apiVersion: camel.apache.org/v1
kind: Integration
metadata:
name: r
spec:
flows:
...
traits:
service:
enabled: true
auto: false
ingress:
enabled: true
auto: false
And then:
ingress.enable
to false
Then what happen is that the pod is restarted
➜ k get pods
NAME READY STATUS RESTARTS AGE
camel-k-operator-6897f796dd-5znkm 1/1 Running 0 22m
r-677dcbf7fc-h7wx6 0/1 Terminating 0 22s
r-6976c6b54b-8lccc 1/1 Running 0 2s
Then, ignoring some of the stuffs that are changing as as side effect of pod restart (uuid, dates, etc), the only difference is in the digest:
In this case my expectation would be that the ingress resource would have been created without affecting the pod execution.
Okey. Indeed, it's because the env template vars in pod changes. I've checked and it seems this var is not really used anywhere. I'll try to remove it at all and run some test to see if it's good to solve this problem.
The CAMEL_K_DIGEST
env var was added to force the re-deploy of the pod in case of a config map changes, like if you only change the flow in an Integration
, then the pod won't actually change, only the config map will so we had to add this variable to be sure the latest routes are taken into account
I had a look at this and the "problem" is that we use the digest as a mean to compare the previous status of the Integration with the actual one in order to trigger a reconciliation cycle. Ideally we should not use it at all, and we could reconcile every time an Integration change as the regeneration of resources belonging to the Integration is idempotent (either the same digest would be the same). However we watch a lot of other objects and the monitoring action could be called so many time degrading the operator performance.
The most immediate solution would be to store and compare 2 digest. One (holding all the configuration we have now) would be "responsible" to trigger the reconciliation cycle when a sensitive change happen. The new one would be a digest accounting only for those configuration that would require also a Pod restart (likely sources only).
@lburgazzoli wdyt?
I wonder if we have to store them at all:
Not sure this would cover all the cases, WDYT ?
* we can optionally compute the digest of the previous and current object in the event filter (i.e. when the generation or phase have changed we don't have to compute it)
This one was an option I've evaluated but we need to discard because we have other objects beside the Integration. Effectively, the main problem as I mentioned above is that we watch a lot of other resources beside the Integration, so, the only possible way to know the status before, is to have this information stored in the same resource for eventual comparison. When an Integration spec change we have a certain degree of control, but when the change happen in any resource bound to the Integration (ie, the Deployment), then, we need to recover such value and make a comparison. Alternatively we may recalculate from scratch every time as I commented previously: "[...] we could reconcile every time an Integration change as the regeneration of resources belonging to the Integration is idempotent (either the same digest would be the same). However we watch a lot of other objects and the monitoring action could be called so many time degrading the operator performance."
I am not able to forecast such a degradation though.
* we can optionally compute the digest of the previous and current object in the event filter (i.e. when the generation or phase have changed we don't have to compute it)
This one was an option I've evaluated but we need to discard because we have other objects beside the Integration. Effectively, the main problem as I mentioned above is that we watch a lot of other resources beside the Integration, so, the only possible way to know the status before, is to have this information stored in the same resource for eventual comparison. When an Integration spec change we have a certain degree of control, but when the change happen in any resource bound to the Integration (ie, the Deployment), then, we need to recover such value and make a comparison. Alternatively we may recalculate from scratch every time as I commented previously: "[...] we could reconcile every time an Integration change as the regeneration of resources belonging to the Integration is idempotent (either the same digest would be the same). However we watch a lot of other objects and the monitoring action could be called so many time degrading the operator performance."
I am not able to forecast such a degradation though.
is it true that if i.e. a deployment changes then we go to the same filter as the Integration resource ? it seems to be odd
is it true that if i.e. a deployment changes then we go to the same filter as the Integration resource ? it seems to be odd
Yes. If you change a value of any Integration managed object, the monitoring loop starts and it applies the trait for the given Integration phase (running). In the case of Deployment it would bring the values back to the ones calculated by the operator.
I know that a reconciliation is triggered but does the integration filter apply ? because if so, then the previous and current integration object would have been the same (i.e. same checksum), so the filter would "skip" the event isn't it ?
I know that a reconciliation is triggered but does the integration filter apply ? because if so, then the previous and current integration object would have been the same (i.e. same checksum), so the filter would "skip" the event isn't it ?
Yes, what it would happen is that the Integration would stick in the Running
phase, so, only applying the traits expected for such execution. Conversely, if there is a change in the digest, the Integration would start from Initialization
phase, passing through all the phases (rebuilding, if the kit is no longer matching the expected configuration). If we remove the digest "optimization", we'd need to run every time this second flow.
I know that a reconciliation is triggered but does the integration filter apply ? because if so, then the previous and current integration object would have been the same (i.e. same checksum), so the filter would "skip" the event isn't it ?
Yes, what it would happen is that the Integration would stick in the
Running
phase, so, only applying the traits expected for such execution. Conversely, if there is a change in the digest, the Integration would start fromInitialization
phase, passing through all the phases (rebuilding, if the kit is no longer matching the expected configuration). If we remove the digest "optimization", we'd need to run every time this second flow.
I don't think I follow 100% but it's been long time since I had a look at all those internals and I may be completely wrong, so please, bear with me:
Now I understand that this may require some more radical changes but in general I'd really like we move toward reducing our dependency on the integration status as it can quickly get out of date.
So @squakez for the time being, I'm fine with your proposed solution.
While working on #5491 I realized there is a very important downside of this way to detect a drift. We cannot change the calculation algorithm itself without creating a disruption when upgrading from one version to another. This is because the older digest was calculated with an older version of the algorithm, and it would differ from the new algorithm that may have introduced some change in doing the calculation, regardless of the Integration configuration.
What happened?
Since an integration digest ends up begin projected as a container env var, including the traits when computing the digest has a number of side effects, as example by enabling/disabling the creation of a route or service, would lead to the container to be restarted which is not expected.
Steps to reproduce
No response
Relevant log output
No response
Camel K version
No response