Open kolbitsch-lastline opened 4 years ago
We definitely use Ghostferry from read-only replica internally and have never encountered any issue. In fact, FOR UPDATE
is necessary even for a read replica as there are still write traffic due to the replication.
RowLock is definitely needed for data correctness purposes, not just for verification. As I recall it's not an intuitive proof why it would cause data corruption. Since this is a common question asked by many people, I can try to find a "prose proof" that I've written a long time ago from our documents about Ghostferry and paste it into here in the nearish future.
I just reran the TLC against the TLA+ specification by making the following change and regenerating the TLA+ from PlusCal:
--- a/tlaplus/ghostferry.tla
+++ b/tlaplus/ghostferry.tla
@@ -285,8 +285,8 @@ PossibleRecords == Records \cup {NoRecordHere}
\* but this seems cumbersome and prone to implementation level error.
\* TODO: model this with TLA+ and validate its correctness.
tblit_loop: while (lastSuccessfulPK < MaxPrimaryKey) {
- tblit_rw: currentRow := SourceTable[lastSuccessfulPK + 1];
- if (currentRow # NoRecordHere) {
+ tblit_r: currentRow := SourceTable[lastSuccessfulPK + 1];
+ tblit_w: if (currentRow # NoRecordHere) {
TargetTable[lastSuccessfulPK + 1] := currentRow;
};
tblit_upkey: lastSuccessfulPK := lastSuccessfulPK + 1;
Each label in PlusCal is an atomic step. If the read and write during the data copy is not in an atomic step, they would be in 2 labels as above. This simulates a scenario with when FOR UPDATE is turned off. As a result, TLC tells me I have an invariant violation. It takes 26 steps in the model to get to this error condition and a runtime of 2min on my computer.
At one point I had a prose explanation of why this is the case but it's been many years and I have since forgotten. Unfortunately the TLC trace showing how the errors can be reproduced is a bit daunting and difficult to parse. I don't have time for it for now so I'm going to paste it into here verbatim. In the future I might post a better explanation of it by parsing the TLC output.
The key line we can see is at the 26th step, the Target and Source tables have different entries:
/\ pc = ( Ferry :> "Done" @@
Application :> "Done" @@
BinlogStreamer :> "Done" @@
TableIterator :> "Done" @@
BinlogWriter :> "Done" )
....
/\ TargetTable = <<r0, r1, NoRecordHere>>
....
/\ SourceTable = <<r0, r0, NoRecordHere>>
We definitely use Ghostferry from read-only replica internally and have never encountered any issue. In fact, FOR UPDATE is necessary even for a read replica as there are still write traffic due to the replication.
that is very interesting. The SELECT ... FOR UPDATE
definitely raises exceptions in our environment.
Could it be that you are connecting to the source DB using a user with the SUPER
privilege? These users should always have the ability to lock for write (actually even write) to the DB.
I just checked and yes we have SUPER.
ok, thanks for confirming.
I'll have to think about this for a bit. I dislike giving our replication SUPER on the source
The permissions you need are:
"SELECT", "SUPER", "REPLICATION SLAVE", "REPLICATION CLIENT"
good to know - would be good to include in the docs somewhere.
NOTE: For me this poses a problem, as we're using GCP and cloud-SQL does not support SUPER
https://cloud.google.com/sql/faq
Cloud SQL does not support SUPER privileges, which means that GRANT ALL PRIVILEGES statements will not work. As an alternative, you can use GRANT ALL ON
%
.*.
I'll see what I can do in our environment
My assumption is that this is to keep the data consistent between the reading and the row verification. Beyond verification, it should be safe to operate without the
FOR UPDATE
(which also improves performance, as we don't require the round-trips for the transaction, which is always rolled back anyways), because any modifications on the source DB overlapping with a row read should be "fixed" by the binlog-writer anyways. Can you confirm my assumption is correct?
As @shuhaowu pointed out above, SELECT...FOR UPDATE
is needed for maintaining the data correctness, not just for the row verification. Let's take a look at a possible sequence of actions that will introduce a data corruption with plain SELECT
. We will use a similar notation as introduced in another ghostferry issue. We denote the database row values with r0 and r1, where r0 represents one set of values and r1 represents a different set of values. We define a "payload" as either an argument that an action is given (e.g. INSERT v1 INTO ...) or a result that an action receives (e.g. SELECT FROM ... -> v1).
Step | Actor | Action | Payload | Source | Target |
---|---|---|---|---|---|
1 | Application | INSERT (SOURCE) | r1 | r1 | nil |
2 | DataIterator | BEGIN | N/A | r1 | nil |
3 | DataIterator | SELECT | r1 | r1 | nil |
4 | Application | UPDATE (SOURCE) | r0 | r0 | nil |
5 | BinlogWriter | UPDATE | r1->r0 | r0 | nil |
6 | BatchWriter | BEGIN | N/A | r0 | nil |
7 | BatchWriter | INSERT IGNORE | r1 | r0 | r1 |
8 | BatchWriter | ROLLBACK | N/A | r0 | r1 |
9 | DataIterator | ROLLBACK | N/A | r0 | r1 |
As we can see in step 4 above, the application is allowed to update the row selected in step 3 because FOR UPDATE
is not enforced. A binlog UPDATE event is then created based on step 4. However, it silently fails to execute step 5 to update the target DB because it must perform r1->r0 and the old row value r1 doesn't exist in the target DB yet. Step 7 then writes the old stale row value r1 (obtained in step 3) to the target DB. At this point, the target DB contains r1, which is different from what is in the source DB, namely r0. If we have a verifier enabled, we will detect the data inconsistency and fail the ghostferry run. Hope this helps and please feel free to ask if you have any questions.
cc @Shopify/pods
thanks for the detailed explanation! Indeed not the most straight-forward sequence of events or something one would think of immediately :-)
I agree that locking is the correct strategy, no doubt there. The problem in my scenario is that I cannot use locking, because of the environment in which I deploy (namely reading from a read-only GCP cloud-SQL slave DB). Not even a root/SUPER user has the privileges to lock on such a system.
From your explanation above, it seems to be related to transactions and rollbacks. I'm not entire sure (but have the strong feeling) that a rolled back transaction would ever make it onto the slave, and the scenario would not occur (because the master reverts and thus never commits the transaction commands to the binlog). Does that sound reasonable?
Also note that our application/use-case guarantees that replicating from a slave is safe. Fail-overs will not occur unless it's guaranteed that the slave has received all data.
I think there's some misunderstanding here about the race condition @kolbitsch-lastline.
The fundamental issue is that the BatchWriter (aka DataIterator / TableIterator) requires the source data to remain unchanged while it commits its writes to the target database. Breaking the atomicity of "read data from source then write data to target" is what causes data corruption.
If there was no atomicity in the read-then-write operation, this is the race condition you'd encounter:
b1
event is thrown out the window and never played onto the target.If all processes (binlogstreamer, application, and table iterator) stop at this point, it's clear that the source table has r0 but target table has a value of r1
I hope the last reply clears up the race condition for you..
Now, onto your issue, which is the lack of permissions to hold a transaction open on a read slave.
First of all, in the example explained above, the "application" can be equated to the mysql replication thread on the read slave. Hence, the race condition still holds, whether you're running ghostferry off of a read slave (that is guarded from your "real" application) or a writer
Secondly, with the current ghostferry implementation I strongly suspect you're out of luck, which makes us sad too!
Finally, I don't think all hope is lost because internally we have discussed that mutually excluding the BinlogWriter from the DataIterator might be an alternative to having the SELECT ... FOR UPDATE
transaction. That is, move the implementation of the atomicity up to the Ghostferry level. But again sadly, we are not motivated to undertake this work because (1) we don't have an immediate usecase for it and (2) it'll lift a lot of complexity up to the Ghostferry code.
In any case, if you want to explore this, we can support you but merging it upstream would require re-validation of the idea in TLA+ (link directs you to the current model for our "read-then-write atomicity")
Let us know your thoughts, and thanks again for all the discussion!
FYI, I'm convinced I had responded to your explanation above, but I can't find it anywhere... not sure if it was deleted, if I posted on a wrong ticket, or if I was being stupid and just never hit Comment
. Either way: thanks a ton for your explanation.
I finally get the problem and agree that there has to be a lock. I find that putting the lock into the ghostferry
application would be slightly cleaner, as it's less likely to intervene with the application/source DB.
I'm currently working on testing a patch that I would love to see what you guys (and TLA+ ;-) ) have to say about.
I agree that the application-level lock is cleaner. I'm also interested to see it modeled, since I'm quite sure there is a correct variation.
The current implementation has been validated extensively in production, and this does add some risk to change, but it is clear the change will benefit some environments.
The current
ghostferry-copydb
default behavior is to use set theRowLock
property on the data-iterator when copying over rows from the source to the target DB. Unfortunately the reason why is not documented very well (that I could find).This default behavior means that it is not possible to replicate from a MySQL slave DB, because such a server typically is run in
READ-ONLY
mode, preventing theSELECT ... FOR UPDATE
used by ghostferry.My assumption is that this is to keep the data consistent between the reading and the row verification. Beyond verification, it should be safe to operate without the
FOR UPDATE
(which also improves performance, as we don't require the round-trips for the transaction, which is always rolled back anyways), because any modifications on the source DB overlapping with a row read should be "fixed" by the binlog-writer anyways. Can you confirm my assumption is correct?If so, would it make sense to disable the use of
RowLock = true
if no verifier is enabled, or at least allow theDataIterator
to allow disabling the row-lock as part of the constructor?