Improve slot table access performance

Commit 8def1a7a542 (part of 0.5.12) already improved the performance of qmn queries compared to 0.5.11.

This commit improves it again compared to 0.5.12 because it turnes out that reading the large slots tuple with 16384 elements from ETS forces the runtime system to perform too many garbage collections. The solution removes the large tuple from State and stores the mapping in an ETS table as {k, v} pairs.

Performance tested on AWS m5.x2large instance with 8 cores. Erlang 22.3.2

Rough numbers for qmn read (@ ~4200 req/seq):

0.5.11: 8ms
0.5.12: 6ms
this: 4ms

Rough numbers for qmn write (@ ~3300 req/seq):

0.5.11: 10 ms
0.5.12: 5.5 ms
this: 3 ms

Even though as a side effect of the solution the slot table update is not atomic anymore it will not cause issues as the current retry logic of MOVED responses should handle it already.

Also tested a version where slots were sharded to 128 ETS tables and also one using process dictionary (not shown on the graph above) with similar results.

adrienmo / eredis_cluster

Improve slot table access performance #45