[PERF] 40% of RSCoordinator background cpu time is spent on sequential (n shards) network buffer write

Given we write to each N shards one after the other the _FT.SEARCH, this is ultimately costing us 40% of total CPU cycles and impacts the thread capabilities of parsing the requests replies and other required logic. Meaning the writes are slow and impacts everything after it.

Sharing the main and background thread flame charts with the #303 already solved so we focus on the new real hotspots. The first thing to notice is that the coordinator thread is the first bottleneck ( reaching 100% CPU usage faster than the main thread ).

Main thread: https://s3.amazonaws.com/benchmarks.redislabs/internal-tasks/perf-186/RSCoordinator/json.25s.main.redis.svg
Background thread: https://s3.amazonaws.com/benchmarks.redislabs/internal-tasks/perf-186/RSCoordinator/json.25s.background.redis.svg

RediSearch / RSCoordinator

[PERF] 40% of RSCoordinator background cpu time is spent on sequential (n shards) network buffer write #305