Commit 8def1a7a542 (part of 0.5.12) already improved the performance
of qmn queries compared to 0.5.11.
This commit improves it again compared to 0.5.12 because it turnes out
that reading the large slots tuple with 16384 elements from ETS forces
the runtime system to perform too many garbage collections. The solution
removes the large tuple from State and stores the mapping in an ETS
table as {k, v} pairs.
Performance tested on AWS m5.x2large instance with 8 cores.
Erlang 22.3.2
Rough numbers for qmn read (@ ~4200 req/seq):
0.5.11: 8ms
0.5.12: 6ms
this: 4ms
Rough numbers for qmn write (@ ~3300 req/seq):
0.5.11: 10 ms
0.5.12: 5.5 ms
this: 3 ms
Even though as a side effect of the solution the slot table update is
not atomic anymore it will not cause issues as the current retry logic
of MOVED responses should handle it already.
Also tested a version where slots were sharded to 128 ETS tables and also one using process dictionary (not shown on the graph above) with similar results.
Commit 8def1a7a542 (part of 0.5.12) already improved the performance of qmn queries compared to 0.5.11.
This commit improves it again compared to 0.5.12 because it turnes out that reading the large slots tuple with 16384 elements from ETS forces the runtime system to perform too many garbage collections. The solution removes the large tuple from
State
and stores the mapping in an ETS table as{k, v}
pairs.Performance tested on AWS m5.x2large instance with 8 cores. Erlang 22.3.2
Rough numbers for qmn read (@ ~4200 req/seq):
Rough numbers for qmn write (@ ~3300 req/seq):
Even though as a side effect of the solution the slot table update is not atomic anymore it will not cause issues as the current retry logic of MOVED responses should handle it already.
Also tested a version where slots were sharded to 128 ETS tables and also one using process dictionary (not shown on the graph above) with similar results.