Open bart-devylder opened 11 years ago
What exactly do you mean with 'batch size'?
As noted in the crakoon
arakoon_multi_get
code (TODO
):
iter = arakoon_value_list_create_iter(keys);
FOR_ARAKOON_VALUE_ITER(iter, &value_size, &value) {
/* TODO Multi syscall vs memory copies... */
WRITE_BYTES(master, &value_size,
ARAKOON_PROTOCOL_UINT32_LEN, rc, &timeout);
RETURN_IF_NOT_SUCCESS(rc);
WRITE_BYTES(master, value, value_size, rc,
&timeout);
RETURN_IF_NOT_SUCCESS(rc);
}
arakoon_value_list_iter_free(iter);
Unlike the Python client, which constructs a large string containing the whole request (including all keys) and sends this to the node using a single write
call (or multiple, if required), the crakoon
implementation first sends the command prefix using a write
call, then uses 2 of them for every key in the request.
This can cause a significant syscall-overhead.
Using writev
might help quite a bit, but some profiling would be in order to make sure this is the actual root cause of the performance difference. Initially cutting down the number of write
calls by half by sending key size & content using a single writev
call should be fairly easy to implement. In a second stage, sending everything using a single large iovec
should reduce syscall overhead even further.
We're seeing an issue with the crakoon multiget performance (against a single node arakoon cluster, using the arakoon 1.6.0 deb from arakoon.org). the test case (code attached - it's C++ but uses the plain crakoon API) does 4096 multigets with a batch size of 1, value size is 4096:
./ara_multi_get --cluster test --nodes test_0,127.0.0.1,12345 keys: 4096, value_size: 4096, batch_size: 1 set took 4.94317 seconds -> 3.23679 MiB/s / 828.618 IOPS get took 0.243652 seconds -> 65.6674 MiB/s / 16810.9 IOPS multiget took 163.901 seconds -> 0.0976198 MiB/s / 24.9907 IOPS
This is a factor of 100 slower than the python client (from the arakoon git repo, branch 1.6):
In [3]: import ara_multi_get
In [4]: client = ara_multi_get.make_client()
In [5]: ara_multi_get.test_multigets?? Type: function Base Class: <type 'function'> String Form:<function test_multigets at 0x26406e0> Namespace: Interactive File: /home/arne/Projects/scrapyard/ara_multi_get.py Definition: ara_multi_get.test_multigets(client, items, batchsize) Source: def test_multigets(client, items, batchsize): keys = [ struct.pack('Q', k) for k in xrange(items) ]
In [6]: ara_multi_get.test_multigets(client, 4096, 1) 2013-09-20 10:27:40,816 starting multigets 2013-09-20 10:27:42,524 multigets with batchsize 1 took 1.707 sec: 2399.18 IOPS
This can also be observed with bigger batches (batchsize 64 -> C: ~ 1000 IOPS, Python ~3500 IOPS).