Open jyotimahapatra opened 3 years ago
I updated gcp and removed usage of maps from cache and downstream. https://github.com/envoyproxy/xds-relay/pull/204 Benchmark looks like this:
➜ xds-relay git:(benchnomap) export MAX_DISCOVERY_REQUESTS=1 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
99036 11910 ns/op
➜ xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
92912 12383 ns/op
➜ xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
94479 12682 ns/op
➜ xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
106798 12866 ns/op
➜ xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
94300 12878 ns/op
➜ xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
12646 86420 ns/op
This is an improvement from the current implementation.
We use a map for storing the requests in cache https://github.com/envoyproxy/xds-relay/blob/master/internal/app/cache/cache.go#L52 which is key'ed on the Discovery Request. As a result, each entry in the map is going to be a unique entry and addition of deletion of unique entries is going to cause a memory overload on the map. It is a known issue in golang maps. (here, here)
In order to prove the hypothesis i replicated the benchmark tests to insert increasing number of DiscoveryRequests and remove them. This simulates the fanout scenario (here). We can see that even if the eventual state in the cache is 1 entry, addition and deletion of increasing amount of map entries causes high degree of processing time.
Benchmarking code: https://github.com/envoyproxy/xds-relay/pull/196
In a separate benchmark test #198 from orchestrator perspective, we got similar results.