Creating an invalid/broken Riak cluster in TO can cause requests to get stuck in infinite loop

If RIAK servers are created in TO and set to ONLINE, they will form a TO-internal shared Riak cluster. If no server in this TO-internal Riak cluster can successfully execute a Riak command (e.g. if TO cannot connect to the Riak servers at all), then TO API requests for endpoints that execute Riak commands (e.g. /api/1.3/cdns/name/:cdn/dnsseckeys.json) will cause goroutines to get stuck in an infinite loop. These goroutines can each peg a CPU on the TO server and can only be stopped by either restarting TO or possibly fixing the riak servers so that they are able to execute commands.

The riak-go-client could be patched to fix this issue but seems to be currently unmaintained: https://github.com/basho/riak-go-client. There is an open PR that might solve the issue, but has not even been commented on by maintainers: https://github.com/basho/riak-go-client/pull/96.

This issue could be alleviated by checking connectivity to the riak servers before creating the TO-internal shared riak cluster.

apache / trafficcontrol

Creating an invalid/broken Riak cluster in TO can cause requests to get stuck in infinite loop #3705