Riak 2.0: datatypes operations are very slow

basho / riak

Riak is a decentralized datastore from Basho Technologies.

http://docs.basho.com

Apache License 2.0

3.94k stars 537 forks source link

Riak 2.0: datatypes operations are very slow #483

Closed oleksiyk closed 10 years ago

oleksiyk commented 10 years ago

I just tried few tests with a 'set' datatype with a javascript PBC client (my 2.0 fork of https://github.com/nlf/riakpbc) and tried inserting (in serial order) several thousands of small uniq entries into the set (10 bytes each) with a similar code:

client.updateDtype({
    bucket: bucket,
    key: 'test',
    type: bucketType,
    op: {
        set_op: {
            adds: 'someSetValue' + i
        }
    }
}, function(err) {
        .... // i += 1
})

and the results were:

first thousand of entries inserted in 54,706 ms
second thousand came after 354,044 ms

I expected it to be faster than my own 'get-merge-put' in application code I used for Riak 1.4.. Are new datatypes (set in particular) designed to hold large number of values? What can be done to speed it up?

russelldb commented 10 years ago

We'll look into it. We're not finished making 2.0, and we're only feature complete on CRDTs (not yet optimised at all.)

As for size, well, anything you wouldn't store in riak_object you can't store in a Set, since it is just stored in a riak_object. There is also some metadata overhead. If you want to add many values to a set, why not use the add_all operation instead? I think you'll find it much, much, faster.

russelldb commented 10 years ago

Please post your numbers of doing this in a tight loop with riak_object (fetch, update, put) for the same numbers.

oleksiyk commented 10 years ago

@russelldb in my use case I'm gradually adding values into the set but not all at once. The problem is that a single insert (DtUpdateReq request) is getting really much slower for each subsequent insert.

With Riak 1.4 I used a JSON array: for each insert I would first fetch the object, resolve possible siblings by doing set union (allow_mult=true), then add new value to array and put the object back. With this scenario it is also getting slower while array grows but not so slow:

1000: 2392ms // first thousand inserted in 2,392ms
2000: 3065ms // second thousand inserted in 3,065ms
3000: 3868ms // third thousand inserted in 3,868ms
4000: 4642ms // ... 
5000: 5326ms
6000: 5694ms
7000: 6380ms

Using 2.0.0-pre11

russelldb commented 10 years ago

Yeah, I'm just pushing a different version of this. riak_dt_orswot:to_binary/1 is the biggest culprit. If you try dropping https://github.com/basho/riak_dt/tree/rdb/orswot-opt branch of riak_dt in as a replacement and test again. We're actually starting on the perf stuff next week, we've known that there are issues (same thing with maps.) We'll keep working on it. Thanks for testing the data types.

oleksiyk commented 10 years ago

Yes, it makes a difference!

1000: 1554ms
2000: 2425ms
3000: 3733ms
4000: 4774ms
5000: 6215ms
6000: 6764ms
7000: 7900ms

russelldb commented 10 years ago

Cool. Bear with us and we'll keep improving this.

seancribbs commented 10 years ago

This seems stale. We've already addressed the latency issue via smaller contexts and t2b compression, and there are existing issues/tasks for benchmarking. Closing.