lowWater increase to cause repeat read get different value

Scenario A:

txn	ts1	(12s delay)	ts2	ts3	ts4	ts5
txnA	put(a)			put(b)	commit
txnB			get(b)			get(b)

txnB first get(b) will get null, but second get(b) will get a value by txnA's put(b). The reason for this is that at ts2, txnB's get(b) will trigger timestamp cache to increase lowWater to ts1, but txnA's put(b) in ts3 don't push the timestamp, so key 'b' still has timestamp ts1, so at ts5, txnB will get the value txnA put. Why txnA's put(b) don't push the timestamp is the statement in range.go line 536 (inside addReadWriteCmd method)

                if !wTS.Less(header.Timestamp) && header.Txn != nil {
                        err := &proto.WriteTooOldError{Timestamp: header.Timestamp, ExistingTimestamp: wTS}
                        reply.Header().SetGoError(err)
                } else if !wTS.Less(header.Timestamp) || !rTS.Less(header.Timestamp) {
                        // Otherwise, make sure we advance the request's timestamp.
                        ...
                }

header.Timestamp in second put(b) is ts1, and wTS is also ts1 which is same as lowWater. so timestamp will not be pushed.

Scenario B:

txn	ts1	ts2	ts3	ts4	ts5	ts6
txnA	put(a)			put(c)	commit
txnB		get(c)				get(c)
txnC			splitRange(b)

txnB first get(c) will get null, but second get(c) will get a value by txnA's put(c). The reason for this is that at ts3, txnC's splitRange by key b will cause the split the initial range to 2 ranges. This new range will have a new lowWater ts3, but txnA's put(c) in ts4 don't push the timestamp, so key 'c' still has timestamp ts1, so at ts6, txnB will get the value txnA put. The reason for txnA's second put don't push the timestamp is the same statement as above: so header.Timestamp in second put(c) is ts1, and wTS is also ts3 which is same as lowWater in the new range. so timestamp will not be pushed.

One possible solution is we should always push the timestamp, so the change is:

                if !wTS.Less(header.Timestamp) && header.Txn != nil {
                        err := &proto.WriteTooOldError{Timestamp: header.Timestamp, ExistingTimestamp: wTS}
                        reply.Header().SetGoError(err)
-               } else if !wTS.Less(header.Timestamp) || !rTS.Less(header.Timestamp) {
+               }
+               if !wTS.Less(header.Timestamp) || !rTS.Less(header.Timestamp) {
                        // Otherwise, make sure we advance the request's timestamp.
                }

But we found that the test case TestRangeSplitsWithConcurrentTxns in kv/split_test.go failed as it expect the txn will not retry even the range split. After the above change, concurrentTxn may retry because range split and the timestamp of put will be pushed, but the SSI txn expect timestamp in EndTransaction header is same as OrigTImestamp (statement in range.go EndTransaction method), but this may not be true because timestamp push, so the test case failed. So we should try the above change and make changes to the failed test case, or any another way? Please comments.

Test case Scenario A:

// When the gap between two writes in a txn is bigger than 10 seconds, lowWater
// in timestamp cache can increase to the timestamp of the first write as
// side effect of read in another txn, but when the txn committed, the write's
// timestamp is not increased. so another txn two reads before and after
// the second write will get different value.
func TestTxnRepeatGetDifferentValue(t *testing.T) {
        db, _, _, mClock, _, stopper, err := createTestDB()
        if err != nil {
                t.Fatal(err)
        } 
        defer stopper.Stop()

        keyA := proto.Key("a")
        keyB := proto.Key("b")
        ch := make(chan struct{})
        // Use snapshot isolation
        txnAOpts := &client.TransactionOptions{
                Name:      "txnA",
                Isolation: proto.SNAPSHOT,
        }
        txnBOpts := &client.TransactionOptions{
                Name:      "txnB",
                Isolation: proto.SNAPSHOT,
        }
        go func() {
                err = db.RunTransaction(txnAOpts, func(txn *client.KV) error {
                        // Put transactional value.
                        if err := txn.Call(proto.Put, proto.PutArgs(keyA, []byte("value1")), &proto.PutResponse{}); err != nil {
                                return err 
                        }
                        // Notify txnB do 1st get(b)
                        ch <- struct{}{}
                        // Wait for txnB notify us to put(b)
                        <-ch
                        // Write now to keyB
                        if err := txn.Call(proto.Put, proto.PutArgs(keyB, []byte("value2")), &proto.PutResponse{}); err != nil {
                                return err
                        }
                        return nil
                })
                // Notify txnB do 2nd get(b)
                ch <- struct{}{}
        }()

        // Wait till txnA finish put(a)
        <-ch
        // Delay 12 seconds
        mClock.Set(time.Duration(12 * time.Second).Nanoseconds())
        err = db.RunTransaction(txnBOpts, func(txn *client.KV) error {
                gr1 := &proto.GetResponse{}
                gr2 := &proto.GetResponse{}

                // Attempt to get first keyB
                if err := txn.Call(proto.Get, proto.GetArgs(keyB), gr1); err != nil {
                        return err
                }
                // Notify txnA put(b)
                ch <- struct{}{}
                // Wait for txnA finish commit
                <-ch
                // get(b) again
                if err := txn.Call(proto.Get, proto.GetArgs(keyB), gr2); err != nil {
                        return err
                }

                if gr1.Value == nil && gr2.Value != nil {
                        t.Fatalf("Repeat read same key in same txn but get different value gr1 nil gr2 %v", gr2.Value)
                }
                return nil
        })
        if err != nil {
                t.Fatal(err)
        }
}

Scenario B:

// When two writes in a txn is interleaved by a range split, the second write
// may redirect to the new range, this new range will have a new lowWater
// in it's timestamp cache. But the second write don't push its timestamp.
// So another txn two reads before and after the range split will get different value.
func TestTxnRepeatGetWithRangeSplit(t *testing.T) {
        db, _, _, mClock, _, stopper, err := createTestDB()
        if err != nil { 
                t.Fatal(err)
        }
        defer stopper.Stop()

        keyA := proto.Key("a")
        keyC := proto.Key("c")
        splitKey := proto.Key("b")
        ch := make(chan struct{})
        // Use snapshot isolation
        txnAOpts := &client.TransactionOptions{
                Name:      "txnA", 
                Isolation: proto.SNAPSHOT, 
        }
        txnBOpts := &client.TransactionOptions{                                                                                           
                Name:      "txnB",                                                                                                        
                Isolation: proto.SNAPSHOT,                                                                                                
        }                                                                                                                                 
        go func() {                                                                                                                       
                err = db.RunTransaction(txnAOpts, func(txn *client.KV) error {                                                            
                        // Put transactional value.                                                                                       
                        if err := txn.Call(proto.Put, proto.PutArgs(keyA, []byte("value1")), &proto.PutResponse{}); err != nil {          
                                return err                                                                                                
                        }                                                                                                                 
                        // Notify txnB do 1st get(c)                                                                                      
                        ch <- struct{}{}                                                                                                  
                        // Wait for txnB notify us to put(c)                                                                              
                        <-ch                                                                                                              
                        // Write now to keyC, which will keep timestamp.                                                                   
                        if err := txn.Call(proto.Put, proto.PutArgs(keyC, []byte("value2")), &proto.PutResponse{}); err != nil {          
                                return err                                                                                                
                        }                                                                                                                 
                        return nil
                  })                                                                                                                        
                // Notify txnB do 2nd get(c)                                                                                              
                ch <- struct{}{}                                                                                                          
        }()                                                                                                                               

        // Wait till txnA finish put(a)                                                                                                   
        <-ch                                                                                                                              

        err = db.RunTransaction(txnBOpts, func(txn *client.KV) error {                                                                    
                gr1 := &proto.GetResponse{}                                                                                               
                gr2 := &proto.GetResponse{}                                                                                               

                // First get keyC, value will be nil                                                                                      
                if err := txn.Call(proto.Get, proto.GetArgs(keyC), gr1); err != nil {                                                     
                        return err                                                                                                        
                }                                                                                                                         
                mClock.Set(time.Duration(1 * time.Second).Nanoseconds())                                                                  
                // Split range by keyB                                                                                                    
                req := &proto.AdminSplitRequest{RequestHeader: proto.RequestHeader{Key: splitKey}, SplitKey: splitKey}                    
                resp := &proto.AdminSplitResponse{}                                                                                       
                if err := db.Call(proto.AdminSplit, req, resp); err != nil {                                                              
                        t.Fatal(err)                                                                                                      
                }                                                                                                                         
                // Wait till split complete                                                                                               
                // Check that we split 1 times in allotted time.                                                                          
                if err := util.IsTrueWithin(func() bool {                                                                                 
                        // Scan the txn records.                                                                                          
                        resp := &proto.ScanResponse{}                                                                                     
                        if err := db.Call(proto.Scan, proto.ScanArgs(engine.KeyMeta2Prefix, engine.KeyMetaMax, 0), resp); err != nil {    
                                t.Fatalf("failed to scan meta2 keys: %s", err)                                                            
                        }                                                                                                                 
                        return len(resp.Rows) >= 2                                                                                        
                }, 6*time.Second); err != nil {                                                                                           
                        t.Errorf("failed to split 1 times: %s", err)                                                                      
                }                                                                                                                                                
                // Notify txnA put(c)                                                                                                     
                ch <- struct{}{}                                                                                                          
                // Wait for txnA finish commit                                                                                            
                <-ch                                                                                                                      
                // Get(c) again                                                                                                           
                if err := txn.Call(proto.Get, proto.GetArgs(keyC), gr2); err != nil {                                                     
                        return err                                                                                                        
                }                                                                                                                         

                if gr1.Value == nil && gr2.Value != nil {                                                                                 
                        t.Fatalf("Repeat read same key in same txn but get different value gr1 nil gr2 %v", gr2.Value)                    
                }                                                                                                                         
                return nil                                                                                                                
        })                                                                                                                                
        if err != nil {                                                                                                                   
                t.Fatal(err)                                                                                                              
        }                                                                                                                                 
}

cockroachdb / cockroach

lowWater increase to cause repeat read get different value #676