Sharing the main and background thread flame charts with the #303 already solved so we focus on the new real hotspots.
The first thing to notice is that the coordinator thread is the first bottleneck ( reaching 100% CPU usage faster than the main thread ).
Each individual shard reply to RSCoordinator ( _FT.SEARCH ) takes around 55 micros
The total time for FT.SEARCH is 420 micros
If we trace the time in which we send the individual _FT.SEARCH requests until we parse them we see it represents 64% of the wall clock time (270 micros out of total 420 micros ( FT.SEARCH)). Meaning 64% of wall clock time is on this block of reply sending + receiving shard replies + parsing. Meaning this is the part that we should target to optimize
Sharing the main and background thread flame charts with the #303 already solved so we focus on the new real hotspots. The first thing to notice is that the coordinator thread is the first bottleneck ( reaching 100% CPU usage faster than the main thread ).
test setup
Considering a 25 shards setup we see:
_FT.SEARCH
requests until we parse them we see it represents 64% of the wall clock time (270 micros out of total 420 micros ( FT.SEARCH)). Meaning 64% of wall clock time is on this block of reply sending + receiving shard replies + parsing. Meaning this is the part that we should target to optimizetracing the network + parsing ( hiredis )