VCNC / haeinsa

Haeinsa is linearly scalable multi-row, multi-table transaction library for HBase
Apache License 2.0
158 stars 42 forks source link

High Concurrency on the same 3 rows acting strange #44

Open ehudl opened 9 years ago

ehudl commented 9 years ago

We are running some load / performance tests on the haeinsa. To check recovery time , our scenario is running X transactions with 2 threads. Each transaction has the same 3 rows. If some fail, retry again with only one thread.

We expected that the first retry using only one thread will succeed to commit all, one by one. but we see that this is not always the case, moreover when X > =100 we sometimes get into long loop of "this row is unstable and not expired yet"

So 3 questions:

  1. If a row get into unstable state, does it mean that I will need to wait till expiration time is over ?
  2. When commit starts but later fails from any reason, Do we get into unstable state in all cases ?
  3. We see special case, when X is bigger than 100 transactions. Do you see a reason this issue will appear exactly when number of items is equal or bigger than 100 ?
ehudl commented 9 years ago

So I have some answers:

  1. I see that in some cases, the row is "not stable" and therefore we will need to wait the expiration time.
  2. When commit starts and then fails, it recover from "pre write" to "abort". But I don't understand why we need abort state, since it seems that new transaction will be able to write only if the row is stable.
  3. It was also reproduced in 50 items, seems that it is related to the deadlock issue. https://github.com/VCNC/haeinsa/issues/41
eincs commented 9 years ago

I agree with that this case is same as #41. Please read my comment on the issue, and give me some any feedback if you still thinks it's weird behavior of Haeinsa.

In addition, here are my answers to your questions:

  1. Don't need to wait till expiration of concurrent transaction in most cases. If transaction succeed or fails, this transaction makes stable all participant rows immediately. (HaeinsaTable.java#L539, HaeinsaTable.java#L600) And then, other transaction can access to rows. Note that most transaction will complete much earlier before it expires.
  2. When commit fails in any reason, the transaction tries recover prewritten rows immediately. (HaeinsaTable.java#L539, HaeinsaTable.java#L600) So, rows remains in unstable state during short term of execution of commit(). Situation that row remaining in unstable state till expiration can be caused by sudden client machine blackout or unexpected long running transaction. And, ABORT is needed to mark transaction as in aborting.
  3. Running many concurrent transactions that accessing same row is high conflict environment. This will cause many ConflictException with "this row is unstable and not expired yet" message. This is also described on wiki. (See: https://github.com/VCNC/haeinsa/wiki/How-to-Use#important-information) This is inevitable, because Haeinsa uses 'optimistic concurrency control'. Other transaction libraries on top of NoSQL, like Percolator, Themis, Omid and so on, are also has the problem. We recommend use Haeinsa in low-conflict environment.

Note: Granularity of lock in Haeinsa is a entire row but other transaction libraries are mostly a single cell. So conflict can be occurs more often in Haeinsa in specific cases. But with this trade-off, Haeinsa could provide high performance (even can be faster than raw HBase API in specific cases) See: https://github.com/VCNC/haeinsa/wiki/Performance
And If you use Haeinsa in low conflict environment, conflict rate is not so high. In our practical use case, failure by conflict rate is just 0.0003% to 0.0010%. See: https://github.com/VCNC/haeinsa/wiki/Performance#conflict-rates

And, this is picture that illustrates your situation: highconccurencytx

Execution of commit takes much longer time than illustrated. So "unstable row" conflict will occur much more than illustrated.