Closed ca-scribner closed 1 year ago
Looking through the logs, it looks like this happened during an install
event. _on_install
does not try to force apply the kubernetes resources, which makes sense why it would fail with a 409 error.
Not sure how we reach an install event at all during an upgrade - maybe this is a quirk about going from podspec to sidecar?
I have read about this kind of conflicts, SSA and CSA, and I have also tried reproducing the issue myself. Here are some important notes:
Podspec charms use the CSA method for applying Kubernetes resources. With this method, the client (kubectl, or whatever client juju uses) is responsible for diffing the desired vs current state of the resources. Lightkube on the other hand uses SSA, which sets a field manager that is responsible for tracking changes in each field of a Kubernetes resource. From [2]:
Fields are assigned a “field manager” which identifies the client that owns them. If you apply a manifest with Kubectl, then Kubectl will be the designated manager. A field’s manager could also be a controller or an external integration that updates your objects. Managers are forbidden from updating each other’s fields. You’ll be blocked from changing a field with kubectl apply if it’s currently owned by a different controller.
Links: [1] Server Side Apply [2] What Is Kubernetes Server-Side Apply (SSA)?
On SSA, "A conflict is a special status error that occurs when an Apply operation tries to change a field, which another user also claims to manage." From official docs, the options for resolving conflicts are:
--force-conflicts
(or force=True
, in the case of lightkube)To identify which method was used for applying the resource, it is as easy as looking into the yaml file format of the object. If it was a last-applied-configuration
annotation, the resource is managed by CSA; it's SSA managed if metadata.managedFields
is present.
For example: ---- CSA ----
apiVersion: v1
kind: Pod
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"containers":[{"image":"nginx:latest","name":"nginx"}]}}
creationTimestamp: "2022-11-24T14:20:07Z"
name: nginx
namespace: default
---- SSA ----
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: "2022-11-24T16:02:29Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:spec:
f:containers:
k:{"name":"nginx"}:
.: {}
f:image: {}
f:name: {}
manager: kubectl
operation: Apply
time: "2022-11-24T16:02:29Z"
seldon-core-operator
then?Version 1.14 of seldon-core-operator
is a podspec charm, which means it was created and managed by the CSA method. When we try to juju refresh
to 1.15, a sidecar charm managed by lightkube which uses SSA, there appears to be a conflict. According to the documentation, upgrading from CSA to SSA is fairly easy, but conflicts may be raised:
Keep the last-applied-configuration annotation up to date. The annotation infers client-side apply's managed fields. Any fields not managed by client-side apply raise conflicts. For example, if you used kubectl scale to update the replicas field after client-side apply, then this field is not owned by client-side apply and creates conflicts on kubectl apply --server-side.
This is likely the case for seldon-core-operator
, we may investigate more.
I was able to reproduce the issue, but in the end the charm seems to resolve its own conflicts and go active after a couple minutes.
force=True
every time?SSA was introduced as a way to facilitate conflict detection, flexible resolution strategies, and prevent unintentional or accidental overwrites without warning.
Always setting this option to True
may be okay for most of the charms use cases, but it's important to understand why. To me, it seems like we want the charm to be the only responsible for patching and updating the Kubernetes resources tied to it, thus be the only responsible for setting the fieldManager to make changes, so always using force=True
is a way for ensuring this. With this, we always say the fieldManager for all fields is whatever the charm dictates and that field values will be OVERWRITTEN whenever the charm calls the apply()
method.
fixed by #148
During upgrade the charm gets stuck with 409 conflict errors during k8s resource creation.
Reproduction steps:
Which yields logs of:
where we see 409 conflict errors when creating the CRDs.
(feels similar to canonical/training-operator#104, but that issue was going between sidecar charms whereas this is going from podspec to sidecar)