basho / riak_cs

Riak CS is simple, available cloud storage built on Riak.
http://docs.basho.com/riakcs/latest/
Apache License 2.0
567 stars 95 forks source link

Suppress and/or warning for bloated manifests #(sibling), value size, #({uuid, manifest}) #882

Closed shino closed 10 years ago

shino commented 10 years ago

In multipart upload with high concurrency (> 30 or > 100) with small part size (~ 5MB), the number of siblings (in riak_object sense) for the manifest grows to O(1000) and in consequence the object size grows to nearly 1GB. I guess riak sibling explosion happened [1].

Riak (riak_kv) also logs for many siblings and/or large object size, it will be helpful to log warnings/errors in riak cs side because bucket name in riak_kv's log are hashed.

Some backpressure to such sibling explosion is still under discussion. Ideas so far are:

  1. If #(siblings) exceeds a threashold, respond with 503 and hope clients sleep and retry.
  2. If #(siblings) exceeds a threashold, sleep 3 seconds for example (or random seconds between 1 and 5 sec is better?) and fetch the manifest again, resolve and write back it.
  3. [Please put your idea here]

As for value size, deserializing 1GB binary consumes considerable memory, it may be worth to check it.

[1] https://github.com/basho/riak_test/pull/383

reiddraper commented 10 years ago

This is the have Riak CS lager:warning/error when a manifest reaches a certain size?

shino commented 10 years ago

I had to start from background :)

Comment below were moved to the description


In multipart upload with high concurrency (> 30 or > 100) with small part size (~ 5MB), the number of siblings (in riak_object sense) for the manifest grows to O(1000) and in consequence the object size grows to nearly 1GB. I guess riak sibling explosion happened [1].

Riak (riak_kv) also logs for many siblings and/or large object size, it will be helpful to log warnings/errors in riak cs side because bucket name in riak_kv's log are hashed.

Some backpressure to such sibling explosion is still under discussion. Ideas so far are:

  1. If #(siblings) exceeds a threashold, respond with 503 and hope clients sleep and retry.
  2. If #(siblings) exceeds a threashold, sleep 3 seconds for example (or random seconds between 1 and 5 sec is better?) and fetch the manifest again, resolve and write back it.
  3. [Please put your idea here]

As for value size, deserializing 1GB binary consumes considerable memory, it may be worth to check it.

[1] https://github.com/basho/riak_test/pull/383

reiddraper commented 10 years ago

Hmm, any idea why/how we're creating so many siblings? I would've hoped they would be being resolved as they were being created.

shino commented 10 years ago

any idea why/how we're creating so many siblings?

From Riak's point of view, [1] demonstrates explosion occurs when get and put are interleaved by 2 concurrent clients, even if they resolve well in read-modify-write. One idea for this direction is some backpressure to clients.

My understaniding is because vclock (for Riak 1.4) is for riak_object, not for riak_content (= {metadata, value} pair).

For Riak CS as Riak application, it should shorten the interval between get and put to decrease the possibility of interleaving as far as it can.

After Riak 2.0 will be available with Riak CS, DVV like approach [2] [1] might help, or using CRDT for manifest would be most promising because parts management in MP manifest is G-set (grow only set). (Both reqiure typed buckets IIRC) Hope there will be more ideas if I know better about 2.0 :smile:

[1] https://github.com/basho/riak_test/pull/383 [2] https://github.com/basho/riak_kv/pull/746

reiddraper commented 10 years ago

From Riak's point of view, [1] demonstrates explosion occurs when get and put are interleaved by 2 concurrent clients, even if they resolve well in read-modify-write. One idea for this direction is some backpressure to clients.

Aha, thanks.

shino commented 10 years ago

Added logging at #915.

kuenishi commented 10 years ago

The warning log has two lines per one exceeded object, but it will be another work deferred to 1.5.1 or later.

kuenishi commented 10 years ago

Another attempt to control sibling (history) explosion is #905 .

shino commented 10 years ago

Changed title in order to include siblings suppression in this issue's scope. Reopen and Moved to 1.5.1 milestone.