apache / trafficcontrol

Apache Traffic Control is an Open Source implementation of a Content Delivery Network
https://trafficcontrol.apache.org/
Apache License 2.0
1.07k stars 344 forks source link

Creating an invalid/broken Riak cluster in TO can cause requests to get stuck in infinite loop #3705

Closed rawlinp closed 2 years ago

rawlinp commented 5 years ago

If RIAK servers are created in TO and set to ONLINE, they will form a TO-internal shared Riak cluster. If no server in this TO-internal Riak cluster can successfully execute a Riak command (e.g. if TO cannot connect to the Riak servers at all), then TO API requests for endpoints that execute Riak commands (e.g. /api/1.3/cdns/name/:cdn/dnsseckeys.json) will cause goroutines to get stuck in an infinite loop. These goroutines can each peg a CPU on the TO server and can only be stopped by either restarting TO or possibly fixing the riak servers so that they are able to execute commands.

The riak-go-client could be patched to fix this issue but seems to be currently unmaintained: https://github.com/basho/riak-go-client. There is an open PR that might solve the issue, but has not even been commented on by maintainers: https://github.com/basho/riak-go-client/pull/96.

This issue could be alleviated by checking connectivity to the riak servers before creating the TO-internal shared riak cluster.

rawlinp commented 2 years ago

Closing this issue as the Riak Traffic Vault implementation is deprecated and will be removed soon, so there is no point in fixing this now.