[Cloud Security][D4C] Yaml policy versioning

mitodrummer commented 11 months ago

Summary OUTDATED

Note: the following approach is being revised, see the comment in this issue below. @nick-alayil will cut a proper epic for this work based off some discussions we've had with @norrietaylor and @mmat11

================================== The Linux Platform team has a PR (https://github.com/elastic/cloud-defend/pull/470) to add "version" to the json schema shared between the cloud-defend repo and kibana. This schema is used to validate policy yaml both in kibana and in the agent.

The kibana side should be updated to include the value from the "version" specified in policy_schema.json when a user saves their integration package. This way, agent will know what version of the policy the package instance was last saved with, and make a determination about whether it should allow the agent to continue running, or whether to put it in a degraded or failed state. see: https://github.com/elastic/cloud-defend/blob/2f11875ad62c3991d64fc00149c50618168e7f49/docs/policy-versioning.md

Definition of done

after the above PR is merged ensure an "Automatic PR" to sync the schemas is was created and merged to kibana

ensure all newly saved D4C policy yaml's have a "version" field specified using the value found here e.g

version: 1.0.0   <-- new field (which user should not be able to change)
process:
selectors:
- name: allProcesses
  operation: [fork, exec]
responses:
- match: [allProcesses]
  actions: [log]
file:
selectors:
- name: executableChanges
  operation: [createExecutable, modifyExecutable]
responses:
- match: [executableChanges]
  actions: [alert]

include checks/tests to ensure a user doesn't modify this field via the yaml editor (or perhaps we can exclude it from the editor so it cannot be changed)

Out of scope

warn the user if re-saving an existing integration will result in a breaking change that requires an up to date agent.

elasticmachine commented 11 months ago

Pinging @elastic/kibana-cloud-security-posture (Team:Cloud Security)

mitodrummer commented 11 months ago

Auto PR: https://github.com/elastic/kibana/pull/163834 with required version additions.

Note: in the PR additionalProperties was updated to false. I believe this was a requirement for the agent side, but we should test if this breaks anything in the configuration UI, and if so, manually override this option in code prior to validating the yaml.

kfirpeled commented 10 months ago

@mitodrummer in cloud security we worked it out differently:

The agent policy contains both schema versions for BC for example

policy:
  v1:
     fieldX:

  v2: 
     breaking_change_field:

This way, only when introduced a breaking change in the schema we increment the schema version and decide whether to keep supporting previous versions as well.

The idea behind it, is that older agents that are deployed would still be supported after an ELK upgrade.

Unlike using the version field (https://github.com/elastic/cloud-defend/pull/470), when we will introduce a breaking change in the schema, like version 2.x.y. Older agents won't be able to support such version change.

So maybe it worth to discuss about this implementation before we continue

mitodrummer commented 10 months ago

Absolutely, this might be a nice way forward, we just need to consider the IAC (standalone) mode. Which maybe isn't such a big issue as the agent version + policy versions are baked together in git somewhere. cc @norrietaylor @mmat11

mitodrummer commented 10 months ago

Upgrade scenarios

Scenario A
- ELK stack is updated (e.g 8.10 -> 8.11), agents and all. A new feature was added to allow 'process blocking' (a new value block added to response.actions enum in the yaml's json schema). JSON schema policy has gone from v0 to v1 (this is a breaking change that old agents will fail to validate).
- At this point any existing D4C integrations will still be running with an old v0 policy (saved before the upgrade)
- The new 8.11 agents should continue to operate normally with this older saved v0 policy (REQUIREMENT 1)
- User goes to edit their integration to enable 'block' on a response, and hits save.
- The older v0 policy remains untouched and a new v1 policy is saved to a new var alongside it.
- 8.11 agent gets the package update, and sees that there is now a v1 location for the new policy and it starts using that.
- The v0 version of the schema will remain forever, and any 8.10 agents still running will still work since v0 still exists in the package alongside the new v1. (caveat: any new integrations created will not contain old policies, and only populate the newest)
Scenario B
- ELK stack is updated (e.g 8.10 -> 8.11), EXCEPT agents. A new feature was added to allow 'process blocking' (a new value block added to response.actions enum in the yaml's json schema). JSON schema policy has gone from v0 to v1.
- User edits an existing D4C integration that has a v0 policy saved to it.
- Because kibana is now using v1 policy, we need to let the user know that only updated agents will get the newly saved updates. The v0 policy will continue to be used by the old agents until they are updated.
- User saves integration with 'block' added. Package now has both v0 (not updated) and v1 (updated) yaml saved.
- Old agents will continue to look at v0 for policy, while new will look at v1

Example agent config:

security-policy:  <-- v0 (old version of yaml unchanged after save)
  file:
    selectors:
      - name: something
        operation:
          - createExecutable
    responses:
      - match: [something]
        actions: [alert]
security-policy-v1:
  file:
    selectors:
      - name: something
        operation:
          - createExecutable
    responses:
      - match: [something]
        actions: [alert, block]    <-- user added block via UI and saved (note v0 above did not get the update)

kfirpeled commented 10 months ago

Waiting for product prioritization - moved back to backlog

elastic / kibana

[Cloud Security][D4C] Yaml policy versioning #163214