[Fleet] Expand agent policy overrides to support updating config for a given input ID

kpollich commented 9 months ago

Today, Fleet supports an overrides field in its PUT /agent_policies/:id API. It'd be useful if this property allowed providing config for a single input on a policy based on its ID, for example

PUT kbn:/api/fleet/agent_policies/:id
{
  "overrides": { 
    "inputs": {
      "[input_id]": { 
        "log.level": "debug"
      }
    }
  }
}

This isn't a finalized or validated approach, just an idea to capture the goal here.

Being able to provide config overrides on a per-input basis would be extremely helpful for support cases and troubleshooting.

elasticmachine commented 9 months ago

Pinging @elastic/fleet (Team:Fleet)

nchaulet commented 9 months ago

I am wondering if it will make more sense to have those overrides closer where the inputs are in the package_policies

PUT /package_policies/:id
{
  "overrides": {}
}

kpollich commented 9 months ago

@nchaulet - I think that'd make sense. As long as we trigger a new revision of the parent agent policy as a result of these overrides, I think it'd be good to make the API operation as close to the actual "source" object (the package policy saved object in this case) in Fleet's API.

cmacknz commented 9 months ago

Putting it under package_policies makes sense but I think makes this a bit harder to use for troubleshooting, primarily because finding the correct agent ID or policy ID in the Fleet UI is easy but finding the correct package policy ID is not. I believe you have to go to the edit integration page to get it from the URL, or query for it.

If you put it under package_policies I worry we'll spend more time communicating and/or correcting use of the correct package policy ID.

kpollich commented 9 months ago

From a troubleshooting perspective, I can see an argument for this operation being more about updating the compiled agent policy rather than the underlying Kibana saved objects that hold the raw policy data as Fleet UI persists it. From that point of view, I feel there's value in performing the "override" action on the agent policy level. Someone troubleshooting an agent that's not behaving as expected isn't concerned with Fleet's CRUD model, they want to operate directly on the agent config as it appears on disk - just by going through Fleet.

From a semantic point of view, though, this is an operation on the package_policy API resource in Fleet's rest API. The implementation that Nicolas provided would be the exact same approach we took to the overrides property in the preexisting agent_policies API, just translated to the package_policies API. I'm in favor of the consistency between these two places.

If you put it under package_policies I worry we'll spend more time communicating and/or correcting use of the correct package policy ID.

Definitely a valid concern. If it's helpful, Fleet automatically includes the package policy ID in the input ID by default (it's also in the compiled policy directly, but we might just have an input name/ID when troubleshooting), e.g. from a cloud cluster I have running:

inputs:
  - id: logfile-system-b8dc6063-2e9e-4111-ad00-9d2b4257075e # <---
    name: system-2
    revision: 1
    type: logfile
    use_output: default
    meta:
      package:
        name: system
        version: 1.38.2
    data_stream:
      namespace: default
    package_policy_id: b8dc6063-2e9e-4111-ad00-9d2b4257075e # <---
    streams:
      - id: logfile-system.auth-b8dc6063-2e9e-4111-ad00-9d2b4257075e
        data_stream:
          dataset: system.auth
          type: logs
        ignore_older: 72h
        paths:
          - /var/log/auth.log*
          - /var/log/secure*
        exclude_files:
          - .gz$
        multiline:
          pattern: ^\s
          match: after
        tags:
          - system-auth
        processors:
          - add_locale: null
      - id: logfile-system.syslog-b8dc6063-2e9e-4111-ad00-9d2b4257075e
        data_stream:
          dataset: system.syslog
          type: logs
        paths:
          - /var/log/messages*
          - /var/log/syslog*
          - /var/log/system*
        exclude_files:
          - .gz$
        multiline:
          pattern: ^\s
          match: after
        processors:
          - add_locale: null
        ignore_older: 72h

This is from the Fleet UI "View Policy" button, but it should also come through in the compiled agent policy when we request diagnostics.

So, we can source the proper ID of the package policy directly from their agent policy/diagnostics with this predictable ID format in mind. This might alleviate some back and forth in SDH's and such.

cmacknz commented 9 months ago

Ah yes I forgot the UUIDs in the input IDs were the package policy ID. That is easy to document.

criamico commented 7 months ago

Summarizing here before starting the implementation:

The PUT agent_policies API currently doesn't support overriding inputs (see openapi), so if I understand correctly:

The current implementation of
```
PUT kbn:/api/fleet/agent_policies/:id
{
"overrides": { }
}
```
should not change, i.e. should still fail with a 400 when the user tries to override an input;
An overrides field with similar behavior should be added to PUT /package_policies/:id endpoint instead
- inputs overrides are allowed
- When a field is overriden, the parent agent policy gets updated with a new revision
- My only question is about other fields in the package policy. Should the user be able to override any field ordo we want to limit to inputs for now?

cmacknz commented 7 months ago

My only question is about other fields in the package policy. Should the user be able to override any field ordo we want to limit to inputs for now?

Since I don't understand package policies in detail yet, what else is in them besides what gets rendered into the relevant input section of the policy?

Depending on what that is, we don't need this API to be a replacement for releasing a new version of a package, which can be done at any time.

What we are primarily targeting is the ability to augment integrations with debugging configuration that the integration may not have been updated to expose yet, or may not want to expose directly to users. The per input log levels are the only current example of this, but I could imagine there could be other similar things later (per integration tracing?).

kpollich commented 7 months ago

Since I don't understand package policies in detail yet, what else is in them besides what gets rendered into the relevant input section of the policy?

Not much, realistically. We store some Kibana-specific metadata like the version/revision + timestamps for the saved object record and a reference to the package + agent policy used to build the package policy.

Here's a full package policy object for a basic policy for reference:

Show JSON

```json { "item": { "id": "400357f7-c162-456f-a87b-71f3ea4f8ac7", "version": "WzU1MTMsMV0=", "name": "system-1", "namespace": "default", "package": { "name": "system", "title": "System", "version": "1.54.0" }, "enabled": true, "policy_id": "4019482d-9b70-4ebf-bb4b-3d5b1ad18060", "inputs": [ { "type": "logfile", "policy_template": "system", "enabled": true, "streams": [ { "enabled": true, "data_stream": { "type": "logs", "dataset": "system.auth" }, "vars": { "ignore_older": { "value": "72h", "type": "text" }, "paths": { "value": [ "/var/log/auth.log*", "/var/log/secure*" ], "type": "text" }, "preserve_original_event": { "value": false, "type": "bool" }, "tags": { "value": [ "system-auth" ], "type": "text" }, "processors": { "type": "yaml" } }, "id": "logfile-system.auth-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "ignore_older": "72h", "paths": [ "/var/log/auth.log*", "/var/log/secure*" ], "exclude_files": [ """\.gz$""" ], "multiline": { "pattern": """^\s""", "match": "after" }, "tags": [ "system-auth" ], "processors": [ { "add_locale": null }, { "rename": { "fields": [ { "from": "message", "to": "event.original" } ], "ignore_missing": true, "fail_on_error": false } }, { "syslog": { "field": "event.original", "ignore_missing": true, "ignore_failure": true } } ] } }, { "enabled": true, "data_stream": { "type": "logs", "dataset": "system.syslog" }, "vars": { "paths": { "value": [ "/var/log/messages*", "/var/log/syslog*", "/var/log/system*" ], "type": "text" }, "preserve_original_event": { "value": false, "type": "bool" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" }, "ignore_older": { "value": "72h", "type": "text" }, "exclude_files": { "value": [ """\.gz$""" ], "type": "text" } }, "id": "logfile-system.syslog-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "paths": [ "/var/log/messages*", "/var/log/syslog*", "/var/log/system*" ], "exclude_files": [ """\.gz$""" ], "multiline": { "pattern": """^\s""", "match": "after" }, "processors": [ { "add_locale": null } ], "tags": null, "ignore_older": "72h" } } ] }, { "type": "winlog", "policy_template": "system", "enabled": true, "streams": [ { "enabled": true, "data_stream": { "type": "logs", "dataset": "system.application" }, "vars": { "preserve_original_event": { "value": false, "type": "bool" }, "event_id": { "type": "text" }, "ignore_older": { "value": "72h", "type": "text" }, "language": { "value": 0, "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "winlog-system.application-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "name": "Application", "condition": "${host.platform} == 'windows'", "ignore_older": "72h" } }, { "enabled": true, "data_stream": { "type": "logs", "dataset": "system.security" }, "vars": { "preserve_original_event": { "value": false, "type": "bool" }, "event_id": { "type": "text" }, "ignore_older": { "value": "72h", "type": "text" }, "language": { "value": 0, "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "winlog-system.security-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "name": "Security", "condition": "${host.platform} == 'windows'", "ignore_older": "72h" } }, { "enabled": true, "data_stream": { "type": "logs", "dataset": "system.system" }, "vars": { "preserve_original_event": { "value": false, "type": "bool" }, "event_id": { "type": "text" }, "ignore_older": { "value": "72h", "type": "text" }, "language": { "value": 0, "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "winlog-system.system-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "name": "System", "condition": "${host.platform} == 'windows'", "ignore_older": "72h" } } ] }, { "type": "system/metrics", "policy_template": "system", "enabled": true, "streams": [ { "enabled": false, "data_stream": { "type": "metrics", "dataset": "system.core" }, "vars": { "period": { "value": "10s", "type": "text" }, "core.metrics": { "value": [ "percentages" ], "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.core-400357f7-c162-456f-a87b-71f3ea4f8ac7" }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.cpu" }, "vars": { "period": { "value": "10s", "type": "text" }, "cpu.metrics": { "value": [ "percentages", "normalized_percentages" ], "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.cpu-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "cpu" ], "cpu.metrics": [ "percentages", "normalized_percentages" ], "period": "10s" } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.diskio" }, "vars": { "period": { "value": "10s", "type": "text" }, "diskio.include_devices": { "value": [], "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.diskio-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "diskio" ], "diskio.include_devices": null, "period": "10s" } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.filesystem" }, "vars": { "period": { "value": "1m", "type": "text" }, "filesystem.ignore_types": { "value": [], "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "value": """- drop_event.when.regexp: system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/) """, "type": "yaml" } }, "id": "system/metrics-system.filesystem-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "filesystem" ], "period": "1m", "processors": [ { "drop_event.when.regexp": { "system.filesystem.mount_point": "^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)" } } ] } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.fsstat" }, "vars": { "period": { "value": "1m", "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "value": """- drop_event.when.regexp: system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/) """, "type": "yaml" } }, "id": "system/metrics-system.fsstat-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "fsstat" ], "period": "1m", "processors": [ { "drop_event.when.regexp": { "system.fsstat.mount_point": "^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)" } } ] } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.load" }, "vars": { "period": { "value": "10s", "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.load-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "load" ], "condition": "${host.platform} != 'windows'", "period": "10s" } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.memory" }, "vars": { "period": { "value": "10s", "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.memory-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "memory" ], "period": "10s" } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.network" }, "vars": { "period": { "value": "10s", "type": "text" }, "network.interfaces": { "value": [], "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.network-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "network" ], "period": "10s", "network.interfaces": null } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.process" }, "vars": { "period": { "value": "10s", "type": "text" }, "process.include_top_n.by_cpu": { "value": 5, "type": "integer" }, "process.include_top_n.by_memory": { "value": 5, "type": "integer" }, "process.cmdline.cache.enabled": { "value": true, "type": "bool" }, "process.cgroups.enabled": { "value": false, "type": "bool" }, "process.env.whitelist": { "value": [], "type": "text" }, "process.include_cpu_ticks": { "value": false, "type": "bool" }, "processes": { "value": [ ".*" ], "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.process-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "process" ], "period": "10s", "process.include_top_n.by_cpu": 5, "process.include_top_n.by_memory": 5, "process.cmdline.cache.enabled": true, "process.cgroups.enabled": false, "process.include_cpu_ticks": false, "processes": [ ".*" ] } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.process.summary" }, "vars": { "period": { "value": "10s", "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.process.summary-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "process_summary" ], "period": "10s" } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.socket_summary" }, "vars": { "period": { "value": "10s", "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.socket_summary-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "socket_summary" ], "period": "10s" } }, { "enabled": true, "data_stream": { "type": "metrics", "dataset": "system.uptime" }, "vars": { "period": { "value": "10s", "type": "text" }, "tags": { "value": [], "type": "text" }, "processors": { "type": "yaml" } }, "id": "system/metrics-system.uptime-400357f7-c162-456f-a87b-71f3ea4f8ac7", "compiled_stream": { "metricsets": [ "uptime" ], "period": "10s" } } ], "vars": { "system.hostfs": { "type": "text" } } }, { "type": "httpjson", "policy_template": "system", "enabled": false, "streams": [ { "enabled": false, "data_stream": { "type": "logs", "dataset": "system.application" }, "vars": { "interval": { "value": "10s", "type": "text" }, "search": { "value": "search sourcetype=\"XmlWinEventLog:Application\"", "type": "text" }, "tags": { "value": [ "forwarded" ], "type": "text" } }, "id": "httpjson-system.application-400357f7-c162-456f-a87b-71f3ea4f8ac7" }, { "enabled": false, "data_stream": { "type": "logs", "dataset": "system.security" }, "vars": { "interval": { "value": "10s", "type": "text" }, "search": { "value": "search sourcetype=\"XmlWinEventLog:Security\"", "type": "text" }, "tags": { "value": [ "forwarded" ], "type": "text" } }, "id": "httpjson-system.security-400357f7-c162-456f-a87b-71f3ea4f8ac7" }, { "enabled": false, "data_stream": { "type": "logs", "dataset": "system.system" }, "vars": { "interval": { "value": "10s", "type": "text" }, "search": { "value": "search sourcetype=\"XmlWinEventLog:System\"", "type": "text" }, "tags": { "value": [ "forwarded" ], "type": "text" } }, "id": "httpjson-system.system-400357f7-c162-456f-a87b-71f3ea4f8ac7" } ], "vars": { "url": { "value": "https://server.example.com:8089", "type": "text" }, "enable_request_tracer": { "type": "bool" }, "username": { "type": "text" }, "password": { "type": "password" }, "token": { "type": "password" }, "preserve_original_event": { "value": false, "type": "bool" }, "ssl": { "value": """#certificate_authorities: # - | # -----BEGIN CERTIFICATE----- # MIIDCjCCAfKgAwIBAgITJ706Mu2wJlKckpIvkWxEHvEyijANBgkqhkiG9w0BAQsF # ADAUMRIwEAYDVQQDDAlsb2NhbGhvc3QwIBcNMTkwNzIyMTkyOTA0WhgPMjExOTA2 # MjgxOTI5MDRaMBQxEjAQBgNVBAMMCWxvY2FsaG9zdDCCASIwDQYJKoZIhvcNAQEB # BQADggEPADCCAQoCggEBANce58Y/JykI58iyOXpxGfw0/gMvF0hUQAcUrSMxEO6n # fZRA49b4OV4SwWmA3395uL2eB2NB8y8qdQ9muXUdPBWE4l9rMZ6gmfu90N5B5uEl # 94NcfBfYOKi1fJQ9i7WKhTjlRkMCgBkWPkUokvBZFRt8RtF7zI77BSEorHGQCk9t # /D7BS0GJyfVEhftbWcFEAG3VRcoMhF7kUzYwp+qESoriFRYLeDWv68ZOvG7eoWnP # PsvZStEVEimjvK5NSESEQa9xWyJOmlOKXhkdymtcUd/nXnx6UTCFgnkgzSdTWV41 # CI6B6aJ9svCTI2QuoIq2HxX/ix7OvW1huVmcyHVxyUECAwEAAaNTMFEwHQYDVR0O # BBYEFPwN1OceFGm9v6ux8G+DZ3TUDYxqMB8GA1UdIwQYMBaAFPwN1OceFGm9v6ux # 8G+DZ3TUDYxqMA8GA1UdEwEB/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggEBAG5D # 874A4YI7YUwOVsVAdbWtgp1d0zKcPRR+r2OdSbTAV5/gcS3jgBJ3i1BN34JuDVFw # 3DeJSYT3nxy2Y56lLnxDeF8CUTUtVQx3CuGkRg1ouGAHpO/6OqOhwLLorEmxi7tA # H2O8mtT0poX5AnOAhzVy7QW0D/k4WaoLyckM5hUa6RtvgvLxOwA0U+VGurCDoctu # 8F4QOgTAWyh8EZIwaKCliFRSynDpv3JTUwtfZkxo6K6nce1RhCWFAsMvDZL8Dgc0 # yvgJ38BRsFOtkRuAGSf6ZUwTO8JJRRIFnpUzXflAnGivK9M13D5GEQMmIl6U9Pvk # sxSmbIUfc2SGJGCJD4I= # -----END CERTIFICATE----- """, "type": "yaml" } } } ], "revision": 1, "created_at": "2024-04-18T15:51:27.880Z", "created_by": "elastic", "updated_at": "2024-04-18T15:51:27.880Z", "updated_by": "elastic" } } ```

What we are primarily targeting is the ability to augment integrations with debugging configuration that the integration may not have been updated to expose yet, or may not want to expose directly to users. The per input log levels are the only current example of this, but I could imagine there could be other similar things later (per integration tracing?).

I think only allowing overrides to target the inputs object is the right way to do this, but we should probably disallow overrides of compiled_stream and only allow edits to variables and their values outside of the compiled result. Updating the compiled_stream and not the "source" values in the outer policy scope would probably break things. cc @criamico

criamico commented 7 months ago

Adding an update here about the current state of this work. I updated the logic to be able to accept a config directly on package policies PUT endpoint, i.e.:

PUT kbn:/api/fleet/package_policies/0751c4fb-fcbc-4f63-acf0-2ebce70f9a49
{
  "overrides": {
    "inputs": {
        "logfile-system-0751c4fb-fcbc-4f63-acf0-2ebce70f9a49": {"log.level": "debug"}
      }
  }
}

This should merge {"log.level": "debug"} on the top level of the package policies, so to obtain

{
  "name": "system-7",
  "namespace": "default",
  "policy_id": "e8b07751-5238-4a33-9afa-3c142e75b22d",
  "enabled": true,
  "log.level": "debug"
  "package": {
    "name": "system",
    "title": "System",
    "version": "1.54.0"
  },
  "inputs": [],
...
 }

However I realized that this won't work as the packagePolicyService.Update tries to write the new config directly in the SO. That means explicitly adding the new mapping in the SO. Is that what we want here?

Perhaps the original idea of doing the change at the agent policy level and expanding the existing overrides field added on this PR would make more sense. @kpollich wdyt?

nchaulet commented 7 months ago

However I realized that this won't work as the packagePolicyService.Update tries to write the new config directly in the SO. That means explicitly adding the new mapping in the SO. Is that what we want here?

In my opinions the way it should we should store the overrides property in the package policy SO in a new field similar to agent policy https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/saved_objects/index.ts#L159

And when generating the full_agent policy we should merge those override with the compiled_inputs https://github.com/elastic/kibana/blob/main/x-pack/plugins/fleet/server/services/agent_policies/package_policies_to_agent_inputs.ts#L88

Does it make sense to you?

kpollich commented 7 months ago

Nicolas beat me to it as I was typing an answer 🙂. The overrides should be persisted to an overrides property on the package policy SO the same way we handle saving them for agent policies. Then, we compile the full agent policy the overrides should be deepMerged onto the resulting input objects.

harshitgupta-qasource commented 5 months ago

Hi Team,

We have worked on this ticket as per update and below are our observations for the same:

Observation

After sending API request for debug through input id, specific policy is set to debug level.

Screenshot

Dev Tool Request

PUT kbn:/api/fleet/package_policies/d05554cd-6921-40bc-940f-414f59224eb0
{
  "overrides": {
    "inputs": {
        "logfile-system.auth-d05554cd-6921-40bc-940f-414f59224eb0": {
           "log_level": "debug"
        }
      }
  }
}

**Build details:**

VERSION: 8.15.0 SNAPSHOT BUILD: 74938 COMMIT: 9e2b401f2c4c315e0b9b037221d7864cf195f321

Hence, we are marking this issue as QA:Validated.

Please let us know if anything else is required from our end.

Thanks!

elastic / kibana

[Fleet] Expand agent policy overrides to support updating config for a given input ID #177323