envoyproxy / xds-relay

Caching, aggregation, and relaying for xDS compliant clients and origin servers
Apache License 2.0
131 stars 29 forks source link

Perf issue during fanout #197

Open jyotimahapatra opened 3 years ago

jyotimahapatra commented 3 years ago

We use a map for storing the requests in cache https://github.com/envoyproxy/xds-relay/blob/master/internal/app/cache/cache.go#L52 which is key'ed on the Discovery Request. As a result, each entry in the map is going to be a unique entry and addition of deletion of unique entries is going to cause a memory overload on the map. It is a known issue in golang maps. (here, here)

In order to prove the hypothesis i replicated the benchmark tests to insert increasing number of DiscoveryRequests and remove them. This simulates the fanout scenario (here). We can see that even if the eventual state in the cache is 1 entry, addition and deletion of increasing amount of map entries causes high degree of processing time.

Benchmarking code: https://github.com/envoyproxy/xds-relay/pull/196

➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op 
BenchmarkCacheRetrieval-8         721880          1509 ns/op         944 B/op         12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8         809854          1473 ns/op         944 B/op         12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8         658707          1641 ns/op         944 B/op         12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8         264152          4144 ns/op         944 B/op         12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8          50784         24675 ns/op         944 B/op         12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8           5220        222593 ns/op         944 B/op         12 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/cache -bench "^(BenchmarkCacheRetrieval)$" | grep ns/op
BenchmarkCacheRetrieval-8            255       4825196 ns/op         944 B/op         12 allocs/op

In a separate benchmark test #198 from orchestrator perspective, we got similar results.

➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8          69771         16503 ns/op        9408 B/op         93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8          64796         16518 ns/op        9408 B/op         93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8          68280         18062 ns/op        9408 B/op         93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8          50516         23984 ns/op        9408 B/op         93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8          28072         41137 ns/op        9408 B/op         93 allocs/op
➜  xds-relay git:(master) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -benchmem -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$"  | grep ns
BenchmarkGoldenPath-8           4819        236752 ns/op        9426 B/op         93 allocs/op
jyotimahapatra commented 3 years ago

I updated gcp and removed usage of maps from cache and downstream. https://github.com/envoyproxy/xds-relay/pull/204 Benchmark looks like this:

➜  xds-relay git:(benchnomap) export MAX_DISCOVERY_REQUESTS=1 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns 
   99036         11910 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=10 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns 
   92912         12383 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=100 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns 
   94479         12682 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=1000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
  106798         12866 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=10000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
   94300         12878 ns/op
➜  xds-relay git:(benchnomap) ✗ export MAX_DISCOVERY_REQUESTS=100000 && go test -run=^$ github.com/envoyproxy/xds-relay/internal/app/orchestrator -bench "^(BenchmarkGoldenPath)$" | grep ns
   12646         86420 ns/op

This is an improvement from the current implementation.