elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.58k stars 8.1k forks source link

[Fleet] Agent policies should not allow duplicate package policies #121798

Closed joshdover closed 2 years ago

joshdover commented 2 years ago

We've had cases where duplicate IDs for the same package policy are added to the package_policies array of an agent policy. We have some uniqueness validation already here, but we may have gaps elsewhere: https://github.com/elastic/kibana/blob/2fa5a87a5f17d8a97b06db7f8d19a30809eb3ade/x-pack/plugins/fleet/server/services/agent_policy.ts#L556-L558

So far this has only been observed on 7.15.2. Before working on this, we should try the repro steps below (https://github.com/elastic/kibana/issues/121798#issuecomment-999361428) against 7.16.2. If it is not reproducible, I think we can close this for now.

The AgentPolicy service in general could use some cleanup and consolidation. There are several code paths to do similar things that could probably be simplified. For example, validation could be extracted and reused in many of these functions.

elasticmachine commented 2 years ago

Pinging @elastic/fleet (Team:Fleet)

simitt commented 2 years ago

Reproducable via following steps:

  1. Create 7.14 deployment on Elastic Cloud
  2. Navigate to Kibana/APM/Settings/Schema and trigger migration to APM Integration (installs version 0.3.0)
  3. Update to 7.15.2
  4. Manually update APM Integration to 0.4.0 via Integrations view
  5. Manually update APM Integration on the cloud policy to 0.4.0 (this is the step where the policy gets duplicated)
Policy

``` id: policy-elastic-agent-on-cloud revision: 4 outputs: default: type: elasticsearch hosts: - >- https://xyz.eastus2.azure.qa.cld.elstc.co:9243 output_permissions: default: Elastic APM: indices: - names: - metrics-apm.app.*-default privileges: - auto_configure - create_doc - names: - logs-apm.error-default privileges: - auto_configure - create_doc - names: - metrics-apm.internal-default privileges: - auto_configure - create_doc - names: - metrics-apm.profiling-default privileges: - auto_configure - create_doc - names: - traces-apm.sampled-default privileges: - auto_configure - create_doc - names: - traces-apm-default privileges: - auto_configure - create_doc _elastic_agent_checks: cluster: - monitor agent: monitoring: enabled: false logs: false metrics: false inputs: - id: 59bfa235-98d8-47e7-891d-dc1f18a661f2 name: Fleet Server revision: 1 type: fleet-server use_output: default meta: package: name: fleet_server version: 1.0.1 data_stream: namespace: default server: port: 8220 host: 0.0.0.0 limits.max_connections: 200 cache: num_counters: 2000 max_cost: 2097152 server.limits: policy_throttle: 200ms checkin_limit: interval: 50ms burst: 25 max: 100 artifact_limit: interval: 100ms burst: 10 max: 10 ack_limit: interval: 10ms burst: 20 max: 20 enroll_limit: interval: 100ms burst: 5 max: 10 server.runtime: gc_percent: 20 - id: de144c2b-c20b-40b5-ae8b-d2f04714c3ad name: Elastic APM revision: 2 type: apm use_output: default meta: package: name: apm version: 0.4.0 data_stream: namespace: default apm-server: capture_personal_data: null max_connections: null max_event_size: null auth: api_key: enabled: true limit: null anonymous: enabled: null allow_agent: null allow_service: null rate_limit: ip_limit: null event_limit: null secret_token: vJI3iPs3phzvPz7rH7 default_service_environment: null shutdown_timeout: 30s rum: enabled: true exclude_from_grouping: null allow_headers: null response_headers: null library_pattern: null allow_origins: null source_mapping: metadata: [] ssl: enabled: true key_passphrase: null certificate: /app/config/certs/node.crt supported_protocols: null curve_types: null key: /app/config/certs/node.key cipher_suites: null response_headers: null write_timeout: null host: '0.0.0.0:8200' max_header_size: null idle_timeout: null expvar.enabled: null read_timeout: 3600 agent_config: [] - id: de144c2b-c20b-40b5-ae8b-d2f04714c3ad name: Elastic APM revision: 2 type: apm use_output: default meta: package: name: apm version: 0.4.0 data_stream: namespace: default apm-server: capture_personal_data: true max_connections: 0 max_event_size: 307200 auth: api_key: enabled: false limit: 100 anonymous: enabled: true allow_agent: - rum-js - js-base - iOS/swift allow_service: null rate_limit: ip_limit: 10000 event_limit: 10 secret_token: null default_service_environment: null shutdown_timeout: 30s rum: enabled: true exclude_from_grouping: ^/webpack allow_headers: null response_headers: null library_pattern: node_modules|bower_components|~ allow_origins: - '*' response_headers: null write_timeout: 30s host: 'localhost:8200' max_header_size: 1048576 idle_timeout: 45s expvar.enabled: false read_timeout: 3600s fleet: hosts: - >- https://xyz.fleet.eastus2.azure.qa.cld.elstc.co:9243 ```

I haven't seen this on any 7.16 deployments (or with the 7.16 APM Integration) yet, therefore not certain this can still happen in the latest Fleet version.

joshdover commented 2 years ago

@dikshachauhan-qasource Would the QAS team be able to spend some time trying to validate this bug against a newer version of the Stack? I suspect these steps should be sufficient:

  1. Create 7.14 deployment on Elastic Cloud
  2. Navigate to Kibana/APM/Settings/Schema and trigger migration to APM Integration (installs version 0.3.0)
  3. Update to 7.16.2 - this is the newer version compared to the repro above
  4. Manually update APM Integration to 7.16.x via Integrations view
  5. Manually update APM Integration on the cloud policy to 7.16.x (this is the step where the policy could get duplicated)
dikshachauhan-qasource commented 2 years ago

Hi @joshdover

We have validated this ticket as per steps shared by you in above comment.

However, we could not observe the managed policy getting duplicated as mentioned in steps when upgraded APM Integration under Policy tab as shared above.

Screenshot :

Please let us know if anything further is required from our end.

Thanks QAS

joshdover commented 2 years ago

@dikshachauhan-qasource Thank you, glad to see we can't reproduce it. Closing for now.