elastic / apm-server

https://www.elastic.co/guide/en/apm/guide/current/index.html
Other
1.22k stars 524 forks source link

Config parsing error causes EA-managed apm-server to fail silently #14560

Open carsonip opened 2 weeks ago

carsonip commented 2 weeks ago

APM Server version (apm-server version): Confirmed on main

Description of the problem including expected versus actual behavior:

When a config with a wrong schema is passed to an EA-managed apm-server, config is passed to reloader, and error is apparently not surfaced, but beater/server will just silently not start.

Steps to reproduce:

  1. Instead of setting apm-server.rum.response_headers to a list, set it to a string
  2. Observe the behavior under EA-managed

Provide logs (if relevant):

carsonip commented 2 weeks ago

I haven't had the time to confirm the actual code path because one cannot simply run debugger on EA managed apm-server, but I suspect that the culprit is that the config parsing error is ignored by libbeat if it isn't a multierror.MultiError in https://github.com/elastic/beats/blob/38dfc52434ed2478b643d9b30f1a21c0e4c39603/x-pack/libbeat/management/managerV2.go#L836

inge4pres commented 2 weeks ago

When running in managed mode there's an elastic-agent-lib component that validates config for us, by parsing the YAML into a config.C struct. AFAIR a C.Validate() method can be implemented in our own types to specify rules applied when parsing the config. I am not 100% sure but there may be validate struct tags that can be added where the config type is defined

https://github.com/inge4pres/apm-server/blob/de7e8075d330c68d826a52613e13bcdf08d34620/internal/beater/config/config.go#L39

to include/exclude certain values.

simitt commented 1 week ago

@carsonip assigned you to this task as you already opened a draft PR. IMO this doesn't have highest urgency or impact, so leaving the timing for the fix up to you.