elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
22 stars 143 forks source link

Store AST directly in provider mappings #6114

Open swiatekm opened 4 days ago

swiatekm commented 4 days ago

What does this PR do?

For all variable provider types, we store mappings as untyped dictionaries and only convert them to AST when we're about to generate actual vars. This means we have to recompute all of it every time there's a change to any of the provider data, even if all the other data is unaffected. For example, if a Kubernetes Pod gains a new label, we need to regenerate vars for all Pods.

Instead, we can store mappings as AST to begin with. Then the only cost of emitting new vars after a change is updating a bunch of references.

Why is it important?

This unnecessary recomputation can be quite expensive when there's a lot of mappings from the dynamic provider. See the benchstat report below:

goos: linux
goarch: amd64
pkg: github.com/elastic/elastic-agent/internal/pkg/composable
cpu: 13th Gen Intel(R) Core(TM) i7-13700H
                       │ bench_main.txt │          bench_dynamic.txt          │
                       │     sec/op     │   sec/op     vs base                │
GenerateVars100Pods-20     755.49µ ± 3%   76.87µ ± 2%  -89.82% (p=0.000 n=10)

                       │ bench_main.txt │          bench_dynamic.txt           │
                       │      B/op      │     B/op      vs base                │
GenerateVars100Pods-20    735.00Ki ± 0%   98.69Ki ± 0%  -86.57% (p=0.000 n=10)

                       │ bench_main.txt │          bench_dynamic.txt          │
                       │   allocs/op    │  allocs/op   vs base                │
GenerateVars100Pods-20     21.335k ± 0%   1.364k ± 0%  -93.61% (p=0.000 n=10)

Checklist

Related issues

elasticmachine commented 4 days ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

swiatekm commented 4 days ago

/test

elastic-sonarqube[bot] commented 4 days ago

Quality Gate passed Quality Gate passed

Issues
0 New issues
1 Fixed issue
0 Accepted issues

Measures
0 Security Hotspots
93.3% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube

ycombinator commented 4 days ago

the CI failures are due to TestOtelAPMIngestion (#4148) and TestRpmFleetUpgrade. The last one fails because it doesn't download in time the elastic-agent-8.17.0-SNAPSHOT-x86_64.rpm artifact with a context deadline exceeded (code ref)

cc @ycombinator should we create an issue about the latter?

I'm not seeing these tests failing on main so either something in this PR is causing them to fail (unlikely) or they are flaky. I just re-ran the integration tests step. If it passes, it means those tests are flaky and we should create an issue for each of them.