envoyproxy / xds-relay

Caching, aggregation, and relaying for xDS compliant clients and origin servers
Apache License 2.0
131 stars 29 forks source link

Add syncMap for storing DiscoveryRequests #195

Closed jyotimahapatra closed 3 years ago

jyotimahapatra commented 3 years ago

While running stress tests we realized that we have some panics due to the use of maps

{"level":"warn","ts":1605908154.3333657,"logger":"wrap","caller":"wrap/wrap.go:895","msg":"detected panic prefix","pid":7,"wrap_id":"8e8c12d27f6c","json":{"service_name":"xdsrelay","log_line":"fatal error: concurrent map iteration and map write"}}
fatal error: concurrent map iteration and map write
goroutine 1004 [running]:
runtime.throw(0xef7437, 0x26)
        /opt/go/src/runtime/panic.go:1116 +0x72 fp=0xc0086ddd08 sp=0xc0086ddcd8 pc=0x435ff2
runtime.mapiternext(0xc0086dddf0)
        /opt/go/src/runtime/map.go:853 +0x552 fp=0xc0086ddd88 sp=0xc0086ddd08 pc=0x411242
github.com/envoyproxy/xds-relay/internal/app/orchestrator.(*orchestrator).fanout(0xc000352b00, 0x103bec0, 0xc00ffa4180, 0xc0019a3950, 0xc001a801d0, 0xc)
        /code/xds-relay-private/xds-relay/internal/app/orchestrator/orchestrator.go:347 +0x141 fp=0xc0086dde60 sp=0xc0086ddd88 pc=0xbed921
github.com/envoyproxy/xds-relay/internal/app/orchestrator.(*orchestrator).watchUpstream(0xc000352b00, 0x1037fa0, 0xc000128010, 0xc001a801d0, 0xc, 0xc001832fc0, 0xc0007491f0, 0xc0019b48c0)
        /code/xds-relay-private/xds-relay/internal/app/orchestrator/orchestrator.go:331 +0x8ea fp=0xc0086ddfa0 sp=0xc0086dde60 pc=0xbed3aa
runtime.goexit()
        /opt/go/src/runtime/asm_amd64.s:1373 +0x1 fp=0xc0086ddfa8 sp=0xc0086ddfa0 pc=0x468711
created by github.com/envoyproxy/xds-relay/internal/app/orchestrator.(*orchestrator).CreateWatch
        /code/xds-relay-private/xds-relay/internal/app/orchestrator/orchestrator.go:228 +0x10e0

We probably don't need to save the requests at all, however that might cause issues with logging because we cant print nodeid anymore. We can brainstorm about the removal of Requests altogether from cache in a separate PR

Signed-off-by: Jyoti Mahapatra jmahapatra@lyft.com