Closed AnDongLi closed 7 years ago
Short answer: validation is never done at the page level, irrespective if the master node uses page-locks or row-locks; each row is validated independently. This is a more efficient way to implement OCC versus the page-based MVCC systems.
Details:
Page lock on replicant versus master node:
Please refer to https://github.com/bloomberg/comdb2/issues/373 for a discussion about transactions involved during a SQL write operation. The page WRITE locks are acquired in the locking pessimist phase (second transaction that applies bplog on the master node).
At this point, the row version is checked (and not the page lsn), and if it is incorrect, a verify error is sent back to replicant node. As you can notice, even in page-lock mode, the OCC validation is still done on a row base. OCC validation is only required for the sql processing phase, which is optimistic and only acquires temporary and non-distributed (i.e. on a single node) page locks.
Rowlocks Goal:
The rowlock mode goal is to eliminate fake contention, i.e. when two concurrent transactions contend on same page trying to update different rows. It will not change the contention results when same rows are written by multiple transactions.
Rowlocks Implementation:
A rowlock transaction breaks various lower level btree writes (i.e. a single index btree update, a single data btree update) into independent physical transactions, which commit independently. This way, every single page-lock is kept for a shorter period of time, reducing the contention on the master node. Each individual row is still protected by a pessimistic rowlock for the duration of the transaction, so two writes touching same rows will content as before, but otherwise fake-sharing only happens for the duration of a small -in terms of number of page locks acquired- physical transaction that most probably doesn't deadlock and commits very fast.
Thanks for the reply. I have some further questions:
Since the smallest unit to flush to the disk is the page, when two different rows in the same page and the second transaction must somehow reload the page before flushing its own change to the disk, otherwise it will lose the first transaction's change. Right?
If the logical transaction has been divided into mutilple physical transactions, how is the consistency among data file and index files maintained? OK, i guess this case is not different from the transaction includes multuple tables. Am I right?
Does the transaction id in WAL log record come from the logical transaction rather than physical transaction?
For the first question, I think of it again, since mpool is the one who owns the page, each transaction just persists the WAL and updates their row in the page, mpool decides when to flush to the disk. So there is no need for each transaction to reload the updated by itself since mpool takes care of it.
An outstanding transaction will hold a lock which prevents any other transaction from modifying the page or row. The lock is held for the duration of the transaction whether or not the page is evicted from the bufferpool.
We require that all sql sessions be snapshot isolation or higher in rowlocks mode. Although redo logs are applied on the replicants, these changes are hidden until the logical transaction commits.
Both. Logical transactions are comprised of several physical transactions. The commit record for each physical segment will contain the segment's physical transaction id as well as the transaction's logical transaction id.
1.To make my question clearer: When in row-lock mode, the lock protect the specific row; so when two rows in the same page are changed, how does the system make sure none of the change get lost in data file and index files? And my confusion came from that I thought that each transaction processor has the page and it calls the flush, but I think that's not the case; the transaction processor only has the row and it changes the row in the page which is owned by the mpool, this way none of the changes will get lost.
A row-lock protects the row against other transactions touching that row. Lower level transactions still have page locks, but these are short span, one btree only. Indeed the data and the index are committed in separate transactions, but no other transaction will write the row in question, but they are free to move components of it around, as they update those pages. Even though either the index or data row component (already committed) are not protected by a page lock anymore, no other transaction will write that row, so NONE of the changes are lost. Neither are dirty reads possible.
Not sure if I follow the question. Records are protected by locks. There are different types of locks: row lock, page lock, table lock, and etc. Different lock types have different granularities, but they all serve the same purpose: to ensure that concurrent execution of transactions will generate the correct results as if the transactions were executed sequentially.
Here is an example.
Transaction A wants to change Row I from 1 to 3. Transaction B wants to change Row II from 2 to 4.
Page lock | Row lock I | Row lock II |
---|---|---|
1 | 2 |
Page lock (held by A) | Row lock I | Row lock II |
---|---|---|
1 | 2 |
Page lock (held by A) | Row lock I | Row lock II |
---|---|---|
3 | 2 |
Page lock | Row lock I | Row lock II |
---|---|---|
3 | 2 |
Page lock (held by B) | Row lock I | Row lock II |
---|---|---|
3 | 2 |
Page lock (held by B) | Row lock I | Row lock II |
---|---|---|
3 | 4 |
Page lock | Row lock I | Row lock II |
---|---|---|
3 | 4 |
Page lock | Row lock I (held by A) | Row lock II (held by B) |
---|---|---|
1 | 2 |
Page lock | Row lock I (held by A) | Row lock II (held by B) |
---|---|---|
3 | 4 |
Page lock | Row lock I | Row lock II |
---|---|---|
3 | 4 |
Obviously, both locking levels generate exactly the same result.
@Rivers: The complexities are the result of having a hybrid system: row locks on top of a page granularity locking system. @AnDongLi: Please bear in mind that comdb2 doesn't implement page based MVCC. Therefore, there is only one page version at any time on master node, and each transaction take control over it through page locks. Each page is changed through a physical transaction. Each row is changed through a logical transaction, which is basically a set of independent physical transactions (one per tree), plus associated row lock. The secret sauce of rowlocks scheme is reducing the amount of time each page lock is kept during a transaction and reducing deadlocks by grouping smaller number of locks in a physical transaction, while maintaining row write atomicity with respect to concurrent logical transactions.
Thanks for all the replies which help to clarify lots of things. Please close this one.
In page-lock mode, I'd guess the transaction who get the page lock first would update the row and the other transaction would give up after it sees that the page-lsn is bigger than the one when the transaction started.
What happens in row-lock mode? Is it the expected result that both of the transactions are successfully commited since they are different row and it is in row-lock mode? If yes, how is it implemented? The second transaction detects that the page-lsn is updated, re-load the page, try again?