Sinusoidal Performance issue fix:
Added a callback to yz_index_hashtree:update exactly the same as the
callback added on the KV side in 2.0.7, because once the snapshot is
completed, we needed to (and can) allow the Solr queues to continue
writing to Solr.
Refactor to partition/index per queue:
By making Solr queues responsible for only a single {partition, index}
pair we greatly simplify the code of the queue and the helper. Rather
than a pool of queues, and a pool of helpers, we now have one
queue/helper pair per partition/index. Beyond code clarity, this
provides many benefits:
We can now drain by partition, which means we can do parallel
exchanges again.
If we no longer own a partition, we can simply stop that pair
(which now has its own supervisor as well).
All the code dealing with dicts of indexq records is now gone - each
solrq has only a single index/partition, and the indexq record is now
gone (collapsed into the main state record).
Additional refactoring around the interaction between helper and
worker, simplifying handling of draining further.
Things we tried in fixing the performance issue that didn't work:
Throttling batches to Solr based on current in-flight queue lengths.
This just allowed the queues to back up more.
Attempted to introduce backpressure by using the riak_core_throttle
module in a pre-commit hook in riak_kv_put_fsm. This failed to resolve
the issue, and in fact made it worse as we would hold up a large number
of requests and then dump them all into KV and the Solr queues.
dict
s of indexq records is now gone - each solrq has only a single index/partition, and the indexq record is now gone (collapsed into the main state record).riak_core_throttle
module in a pre-commit hook inriak_kv_put_fsm
. This failed to resolve the issue, and in fact made it worse as we would hold up a large number of requests and then dump them all into KV and the Solr queues.