basho / yokozuna

Riak + Solr
245 stars 76 forks source link

colocated replicas squash each other #1

Closed rzezeski closed 12 years ago

rzezeski commented 12 years ago

Why is it bad?

Collocated replicas cause incorrect and non-deterministic query results.

When can it happen?

Any time the number of active nodes is less than the target_n_val. Given the default value of 4 this could happen with a cluster of 3 or less. It could happen in a 5 node cluster when 2 nodes go down and new or existing data is written. It could happen in a 10 node cluster where a partition occurs leaving two cluster of 7 and 3. The 7 node partition won't have collocated replicas because it has more than target_n_val nodes but the 3 node side doesn't.

What is the cause?

The fundamental cause is the assumption that each partition has a dedicated backend instance. Normally, every partition has it's own instance of bitcask, leveldb, etc. Given a preflist where two replicas are on the same node there are still two distinct copies of that object on two different instances of the backend. Partitions do not share backend instances.

Yokozuna DOES share one Solr instance and one Core instance for all partitions on the node. By default, each Solr document must have a unique identifier. When a document is written with an existing id then it overwrites the old one. Yokozuna writes documents using the Riak key as the unique id and a field that lists the single owning partition. The following is an example.

<doc>
  <id>riak_key</id>
  <_pn>2<_pn>
  <text>I'll take bolth, please.</text>
</doc>

Querying will work as long as this is the only replica of key riak_key on this node. But if another replica is written then it will squash the previous value for _pn. This means that queries which filter by partition 2 will come up empty when they shouldn't.

rzezeski commented 12 years ago

What is the fix?

The crux of the issue is that the Riak key is being used as a unique key in Solr but it's not unique among partitions. This seemed like the most straight forward way to key documents in Solr but it causes issues as described.

The easiest fix is to use a concatenation of the Riak key and partition number as the unique key. In order to still have access to the Riak key without doing string manipulation a new field, _rk (Riak Key), will be added. Now the doc will look like the following.

<doc>
  <id>riak_key_2</id>
  <_rk>riak_key</_rk>
  <_pn>2</_pn>
  <text>I'll take bolth, please.</text>
</doc>

This will prevent collocated replicas from stomping on each other while still working with the current query filter.