crossplane / oam-kubernetes-runtime

A set of libraries for building OAM runtimes
Apache License 2.0
278 stars 80 forks source link

Adding observed generation to prevent passing the old data #222

Open wonderflow opened 3 years ago

wonderflow commented 3 years ago

Problem Background

When we create an AppConfig, using dependency feature like below, the component will rely the output of echo trait, and binding the data into field spec.fieldA.

kind: ApplicationConfiguration
spec:
  components:
  - componentName: my-web
    dataInputs:
    - fieldPaths: 
        - spec.fieldA
      valueFrom:
        dataOutputName: output-data
    traits:
    - trait:
        kind: EchoTrait
        spec:
          data: abc
      dataOutputs:
        - name: output-data
          fieldPath: status.data
          conditions:
          - op: eq
            value: True
            fieldPath: status.Ready

Everything works well at first creation. OAM runtime will wait the status of trait CR to be ready, and after that pass the value abc into workload spec.fieldA.

In day 2, when we update the AppConfig, changed the spec, for example, change the data field from abc to xyz, and leave the other fields as they are.

kind: ApplicationConfiguration
spec:
  components:
...
    traits:
    - trait:
        kind: echo
        spec:
-         data: abc
+         data: xyz
      dataOutputs:
...

The bug come out: the value of workload in spec.fieldA is still abc, nothing changed.

After digging the problem, I find out that, when we update the trait CR, before the controller observed, the status is already in ready=true state.

OAM lacks of mechanism to indicate whether workload/trait CR is successfully observed by its controller.

So when an AppConfig CR is updated, we can't confirm whether the update is resolved(by workload/trait controller) or not. So the status results in our case are also unreliable.

This is a little like K8s observedGeneration mechanism, and I used to raise an issue to add observedGeneration for AppConfig and Component CR.

While observedGeneration is not enough. Like the picture below, the observedGeneration can't ensure workload/trait reconciled successfully.

image

Proposal

So I propose to add a mechanism here, the overall idea is to let oam-runtime know whether the latest version of workload/trait is observed/reconciled.

  1. Everytime, AppConfig is updated, let's calculate the hash of it's spec, naming it as oam-app-hash and propagate as label into trait and workload CR.
kind: ApplicationConfiguration
metadata:
  labels:
    oam-app-hash: 7fc5986cdc
spec:
  components:
  - componentName: my-web
    dataInputs:
    - fieldPaths: 
...
---
kind: containerized
metadata:
  labels:
    oam-app-hash: 7fc5986cdc
spec:
  ...
---
kind: EchoTrait
metadata:
  labels:
    oam-app-hash: 7fc5986cdc
spec:
  ...
  1. When the controller of workload/trait successfully reconciled/observed this change, it will update the status.
kind: containerized
metadata:
  labels:
    oam-app-hash: 7fc5986cdc
spec:
  ...
status:
  observedApp: 7fc5986cdc
---
kind: EchoTrait
metadata:
  labels:
    oam-app-hash: 7fc5986cdc
spec:
  ...
status:
  observedApp: 7fc5986cdc
  1. OAM runtime can compare the value of status.observedApp with AppConfig labels.oam-app-hash. There are two ways to check this mechanism.
    • Solution 1: always check status.observedApp in dependency, if this field not exist, ignore it, if this field exist but not match, regard it as condition not match.
    • Solution 2: Add valueFrom fieldPath for datainput/output condition, let oam-runtime check that value from metatdata.label should equal to status.observedApp
      ...
      dataOutputs:
      - name: output-data
        fieldPath: status.data
        conditions:
        - op: eq
          valueFrom:
             fieldPath: .metadata.labels.oam-app-hash
          fieldPath: status.observedApp

      I prefer the second solution better, as it's loosely coupled.

image

@hongchaodeng Would you like to fix this issue?

\cc @ryanzhang-oss @resouer

Diddaa commented 3 years ago

Problem described greatly clear, it just like CAS problem, we need Compare version before make any actions.

hongchaodeng commented 3 years ago

I will fix the issue as soon as I can. Will pull up a PR in about two days. The fix will be based on observed generation and will put a generation label into Workload/Trait CR.

For example, if an appConfig has:

kind: AppConfig
metadata:
  generation: 1

Then its children CR will have:

kind: ExampleTrait
metadata:
  labels:
    core.oam.dev/app-generation: "1"

Once the generation doesn't match the dependency engine will keep reconciling.

wonderflow commented 3 years ago

I have sent a PR about this mechanism: #223

Add valueFrom fieldPath for datainput/output condition, let oam-runtime check that value from metatdata.label should equal to status.observedApp

 dataOutputs:
   - name: output-data
     fieldPath: status.data
     conditions:
     - op: eq
       valueFrom:
          fieldPath: .metadata.labels.oam-app-hash
       fieldPath: status.observedApp