cockroachdb / activerecord-cockroachdb-adapter

CockroachDB adapter for ActiveRecord.
Apache License 2.0
103 stars 51 forks source link

activerecord: can produce a dirty read #237

Open otan opened 3 years ago

otan commented 3 years ago

Refs: cockroachdb/cockroach#68363 (ConcurrentTransactionTest#test_transaction_isolation__read_committed, first hit Aug 12)

Setup a table:

CREATE TABLE "developers" ("id" bigserial primary key, "name" character varying, "first_name" character varying, "salary" int DEFAULT '70000', "firm_id" bigint, "mentor_id" int, "legacy_created_at" timestamp, "legacy_updated_at" timestamp, "legacy_created_on" timestamp, "legacy_updated_on" timestamp);
 insert into developers (id) values (1);

In shell 1, type:

BEGIN;
SELECT "developers"."id", "developers"."name", "developers"."salary", "developers"."firm_id", "developers"."mentor_id", "developers"."legacy_created_at", "developers"."legacy_updated_at", "developers"."legacy_created_on", "developers"."legacy_updated_on" FROM "developers" WHERE "developers"."id" = 1 LIMIT 1;
UPDATE "developers" SET "salary" = 200000 WHERE "developers"."id" = 1;

In shell 2, run:

BEGIN;

SELECT "developers"."id", "developers"."name", "developers"."salary", "developers"."firm_id", "developers"."mentor_id", "developers"."legacy_created_at", "developers"."legacy_updated_at", "developers"."legacy_created_on", "developers"."legacy_updated_on" FROM "developers" WHERE "developers"."id" = 1 LIMIT 1;

Back to shell 1:

SELECT "developers"."id", "developers"."name", "developers"."salary", "developers"."firm_id", "developers"."mentor_id", "developers"."legacy_created_at", "developers"."legacy_updated_at", "developers"."legacy_created_on", "developers"."legacy_updated_on" FROM "developers" WHERE "developers"."id" = 1 LIMIT 1;

UPDATE "developers" SET "salary" = 80000 WHERE "developers"."id" = 1;

SELECT "developers"."id", "developers"."name", "developers"."salary", "developers"."firm_id", "developers"."mentor_id", "developers"."legacy_created_at", "developers"."legacy_updated_at", "developers"."legacy_created_on", "developers"."legacy_updated_on" FROM "developers" WHERE "developers"."id" = 1 LIMIT 1;

COMMIT;

Now in shell 2, we see a salary 200000, which is a dirty read (we never committed a salary of 200000):

SELECT "developers"."id", "developers"."name", "developers"."salary", "developers"."firm_id", "developers"."mentor_id", "developers"."legacy_created_at", "developers"."legacy_updated_at", "developers"."legacy_created_on", "developers"."legacy_updated_on" FROM "developers" WHERE "developers"."id" = 1 LIMIT 1;
;
  id | name | salary | firm_id | mentor_id | legacy_created_at | legacy_updated_at | legacy_created_on | legacy_updated_on
-----+------+--------+---------+-----------+-------------------+-------------------+-------------------+--------------------
   1 | NULL | 200000 |    NULL |      NULL | NULL              | NULL              | NULL              | NULL
(1 row)

This can take a bit of time to reproduce (i've had to do this a few times each reproduction), but an external ORM, the contributor to the ORM and I have hit it multiple times, so it is not a fluke.

blathers-crl[bot] commented 3 years ago

Hi @otan, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

otan commented 3 years ago

removing blocker for now as i haven't been able to reproduce a third time and questioning my sanity

otan commented 3 years ago

ruby code:

    # Test for dirty reads among simultaneous transactions.
    def test_transaction_isolation__read_committed
      # Should be invariant.
      original_salary = Developer.find(1).salary
      temporary_salary = 200000

      assert_nothing_raised do
        threads = (1..3).map do
          Thread.new do
            Developer.transaction do
              # Expect original salary.
              dev = Developer.find(1)
              assert_equal original_salary, dev.salary

              dev.salary = temporary_salary
              dev.save!

              # Expect temporary salary.
              dev = Developer.find(1)
              assert_equal temporary_salary, dev.salary

              dev.salary = original_salary
              dev.save!

              # Expect original salary.
              dev = Developer.find(1)
              assert_equal original_salary, dev.salary
            end
            Developer.connection.close
          end
        end

        # Keep our eyes peeled.
        threads << Thread.new do
          10.times do
            sleep 0.05
            Developer.transaction do
              # Always expect original salary.
              assert_equal original_salary, Developer.find(1).salary
            end
          end
          Developer.connection.close
        end

        threads.each(&:join)
      end

      assert_equal original_salary, Developer.find(1).salary
    end

have not been able to reproduce with a go equivalent, you can try https://github.com/otan-cockroach/repro71652 if you'd like

keithdoggett commented 3 years ago

Within the test, it happens on about 1/3 of the times I've run it and we've seen it in the CI. I was able to get it to work in a shell on the first try, but I just had a single node running instead of a cluster, so maybe that makes it easier to reproduce?

keithdoggett commented 3 years ago

I was able to reproduce on version 21.2.0-beta.4

nvanbenschoten commented 3 years ago

@keithdoggett have you been able to reproduce while running just this single test, or does it require running the entire test suite? Have you been able to reproduce while running the test locally (not in CI)?

keithdoggett commented 3 years ago

@nvanbenschoten I can reproduce it as a single test both locally and in the CI.

blathers-crl[bot] commented 3 years ago

Hi @dt, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

blathers-crl[bot] commented 3 years ago

Hi @nvanbenschoten, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

nvanbenschoten commented 3 years ago

Thanks @keithdoggett, that's very helpful information! @rafiss has a Ruby env set up, so he's going to see how easily he can get this to fail when stressing the single test. If the cycle time is short enough, he's going to help bisect this back to a single commit.

keithdoggett commented 3 years ago

That sounds good let me know if you need any help with it. It was failing often enough when running locally so hopefully it won't take too long to find the commit.

rafiss commented 3 years ago

Update: still working on this, but my local ruby environment had broken since the last time I used it. I should have things working locally soon.

nvanbenschoten commented 3 years ago

I was able to reproduce this in 1 of 40 runs of the activerecord roachtest running on 7e38d69aadb0c294bef8e3f3746ac5b1362a1041. That commit includes the fix in https://github.com/cockroachdb/cockroach/pull/70744, which could have been one explanation of this.

However, I have been unable to reproduce in a shell or while running with @otan's Go script: https://github.com/otan-cockroach/repro71652. I'm wondering whether that script performs txn retries in the same way as activerecord.

otan commented 3 years ago

if you need to speed up the AR roach test, run TEST_FILES_AR="test/cases/transactions_test.rb" TESTOPTS='-n=/ConcurrentTransactionTest#test_transaction_isolation__read_committed/' bundle exec rake test

Apparently activerecord rarely hits the ProtoRefresh error :\

nvanbenschoten commented 3 years ago

I'm wondering whether that script performs txn retries in the same way as activerecord.

Even when using https://pkg.go.dev/github.com/cockroachdb/cockroach-go/v2/crdb, I'm still not able to reproduce in Go.

rafiss commented 3 years ago

I have the tests running locally. I used the test file below then ran with RUBYOPT="-W0" TESTOPTS="-v" TEST_FILES=test/cases/transactions_test.rb bundle exec rake test. I have reproduced so far:

Now that I have at least some sort of baseline I'll try to get this running in a script.

Code to repro:

``` require "cases/helper" require "models/topic" require "models/reply" require "models/developer" require "models/computer" class TransactionTest < ActiveRecord::TestCase self.use_transactional_tests = false fixtures :topics, :developers def setup @first, @second = Topic.find(1, 2).sort_by(&:id) end end if ActiveRecord::Base.connection.supports_transaction_isolation? && !current_adapter?(:SQLite3Adapter) class ConcurrentTransactionTest < TransactionTest # This will cause transactions to overlap and fail unless they are performed on # separate database connections. def test_transaction_per_thread threads = 3.times.map do Thread.new do Topic.transaction do topic = Topic.find(1) topic.approved = !topic.approved? assert topic.save! topic.approved = !topic.approved? assert topic.save! end Topic.connection.close end end threads.each(&:join) end # Test for dirty reads among simultaneous transactions. def test_transaction_isolation__read_committed # Should be invariant. original_salary = Developer.find(1).salary temporary_salary = 200000 assert_nothing_raised do threads = (1..3).map do Thread.new do Developer.transaction do # Expect original salary. dev = Developer.find(1) assert_equal original_salary, dev.salary dev.salary = temporary_salary dev.save! # Expect temporary salary. dev = Developer.find(1) assert_equal temporary_salary, dev.salary dev.salary = original_salary dev.save! # Expect original salary. dev = Developer.find(1) assert_equal original_salary, dev.salary end Developer.connection.close end end # Keep our eyes peeled. threads << Thread.new do 10.times do sleep 0.05 Developer.transaction do # Always expect original salary. assert_equal original_salary, Developer.find(1).salary end end Developer.connection.close end threads.each(&:join) end assert_equal original_salary, Developer.find(1).salary end end end ```
rafiss commented 3 years ago

RIP i wish i knew about the TESTOPTS='-n=/ConcurrentTransactionTest#test_transaction_isolation__read_committed/' trick before

nvanbenschoten commented 3 years ago

@rafiss if you can get a reproduction with a crdb binary that includes https://github.com/cockroachdb/cockroach/commit/247332bc191a6d7bbcb043bf45509dc9517f78cf and is running with --vmodule=replica_evaluate=2 then I can try to piece together a timeline of how this dirty read materializes from the CRDB logs.

otan commented 3 years ago

also one thing i didn't check is whether it's a dirty read, or we actually commit 200000 to be the result (which is also bad)

rafiss commented 3 years ago

~Here's a debug.zip from a repro on 247332b running with --vmodule=replica_evaluate=2. I'm working on the bisect as well.~ debug.zip

Whoops.... I ran against the wrong commit

rafiss commented 3 years ago

OK, here's the debug.zip from https://github.com/cockroachdb/cockroach/commit/247332bc191a6d7bbcb043bf45509dc9517f78cf

debug.zip

nvanbenschoten commented 3 years ago

I see something strange in those logs. Searching for locking Get operations on what I believe to be this key, I see the following logs:

I211018 21:27:28.872583 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141885 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835576000,0 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"\331\315\033\035\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\...›, err=<nil>
I211018 21:27:28.932882 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141897 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:1 > key_locking:Exclusive›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=1} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"\013\261\230\300\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\340\237\362\276\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\340\23...›, err=<nil>
I211018 21:27:29.079332 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141953 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.952449000,0 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"k\236ZU\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\0...›, err=<nil>
I211018 21:27:29.143491 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141976 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:1 > key_locking:Exclusive›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=1} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<num_keys:1 num_bytes:58 > value:<raw_bytes:"\204v8[\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J" timestam...›, err=<nil>
I211018 21:27:29.382816 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142022 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=d43a2fcf pri=0.00201333 epo=0 ts=1634592449.381285000,0 min=1634592449.381285000,0 seq=0} lock=true stat=PENDING rts=1634592449.381285000,0 wto=false gul=1634592449.881285000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"m#6$\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\214\207\001\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\214\207\001"...›, err=<nil>

These logs include 3 transactions. The first two read the key twice, which aligns with the two UPDATE statements. The third only reads the key once, which is surprising.

Searching then for the outcome of these three txns, we see:

# first txn
I211018 21:27:28.851557 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141878 evaluated Get command ‹header:<key:"\357\211\211\210" >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835576000,0 min=1634592448.835576000,0 seq=0} lock=false stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"\331\315\033\035\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\...›, err=<nil>
I211018 21:27:28.872583 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141885 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835576000,0 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"\331\315\033\035\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\...›, err=<nil>
I211018 21:27:28.872810 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141886 evaluated Put command ‹header:<key:"\357\211\211\210" sequence:1 > value:<raw_bytes:"\013\261\230\300\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\340\237\362\276\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\...›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=1} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:28.912699 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141892 evaluated QueryIntent command ‹header:<key:"\357\211\211\210" > txn:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:1 > error_if_missing:...›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<> found_intent:true›, err=<nil>
I211018 21:27:28.912777 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141893 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:1 >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=1} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"\013\261\230\300\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\340\237\362\276\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\340\23...›, err=<nil>
I211018 21:27:28.932882 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141897 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:1 > key_locking:Exclusive›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=1} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"\013\261\230\300\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\340\237\362\276\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\340\23...›, err=<nil>
I211018 21:27:28.933107 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141898 evaluated Put command ‹header:<key:"\357\211\211\210" sequence:2 > value:<raw_bytes:"k\236ZU\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361...›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=2} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:28.993963 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141919 evaluated QueryIntent command ‹header:<key:"\357\211\211\210" > txn:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:2 > error_if_missing:...›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<> found_intent:true›, err=<nil>
I211018 21:27:28.994155 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141920 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:2 >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=2} lock=true stat=PENDING rts=1634592448.835576000,0 wto=false gul=1634592449.335576000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"k\236ZU\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\0...›, err=<nil>
I211018 21:27:28.994699 100349 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141921 evaluated PushTxn command ‹header:<key:"\357\211\211\210" > pusher_txn:<meta:<id:21feff13-cf87-4301-ad4d-717c687d7a35 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 > min_timestamp:<wall_time:1634592448835827000 > priority:117111 > name:"sql txn" last_heart...›, txn=<nil> : resp=‹header:<> pushee_txn:<meta:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:1 > last_heartbeat:<wall_time:1...›, err=‹failed to push meta={id=5b957646 key=/Table/103/1/1/0 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=1} lock=true stat=PENDING rts=0,0 wto=false gul=0,0›
I211018 21:27:28.995103 100349 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141922 evaluated QueryTxn command ‹header:<key:"\357\211\211\210" > txn:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:1 >›, txn=<nil> : resp=‹header:<> queried_txn:<meta:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:1 > last_heartbeat:<wall_time:...›, err=<nil>
I211018 21:27:29.015427 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141924 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448835576000 >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,1 wto=false gul=1634592449.335576000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.015554 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141925 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448835576000 >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,1 wto=false gul=1634592449.335576000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.015647 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141926 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448835576000 >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,1 wto=false gul=1634592449.335576000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.015730 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141928 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448835576000 >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,1 wto=false gul=1634592449.335576000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.015815 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141929 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448835576000 >›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,1 wto=false gul=1634592449.335576000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.016115 100401 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141930 evaluated EndTxn command ‹header:<key:"\357\211\211\210" sequence:3 > commit:true deadline:<wall_time:1634592710654402320 > lock_spans:<key:"\357\211\211\210" > txn_heartbeating:true›, txn="sql txn" meta={id=5b957646 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=3} lock=true stat=PENDING rts=1634592448.835827000,1 wto=false gul=1634592449.335576000,0 : resp=‹header:<txn:<meta:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:3 > name:"sql txn" status:COMMITTED last...›, err=<nil>
I211018 21:27:29.063683 121145 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141942 evaluated ResolveIntent command ‹header:<key:"\357\211\211\210" > intent_txn:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:3 > status:COM...›, txn=<nil> : resp=‹header:<>›, err=<nil>

# second txn
I211018 21:27:29.057736 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141940 evaluated Get command ‹header:<key:"\357\211\211\210" >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.952449000,0 min=1634592448.952449000,0 seq=0} lock=false stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"k\236ZU\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\0...›, err=<nil>
I211018 21:27:29.079332 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141953 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.952449000,0 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"k\236ZU\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\006\030\254\232\357\226\014\260\255\210\255\005\030\200\233\357\226\014\200\361\332\370\0...›, err=<nil>
I211018 21:27:29.079557 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141954 evaluated Put command ‹header:<key:"\357\211\211\210" sequence:1 > value:<raw_bytes:"\204v8[\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J...›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=1} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.120400 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141965 evaluated QueryIntent command ‹header:<key:"\357\211\211\210" > txn:<id:c6e5306d-a19f-4bcf-8a41-4fa41b81aa36 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448973936000 logical:1 > min_timestamp:<wall_time:1634592448952449000 > priority:73935 sequence:1 > error_if_missing:t...›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<> found_intent:true›, err=<nil>
I211018 21:27:29.120640 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141967 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:1 >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=1} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<num_keys:1 num_bytes:58 > value:<raw_bytes:"\204v8[\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J" timestam...›, err=<nil>
I211018 21:27:29.143491 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141976 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:1 > key_locking:Exclusive›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=1} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<num_keys:1 num_bytes:58 > value:<raw_bytes:"\204v8[\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\260\313\321J" timestam...›, err=<nil>
I211018 21:27:29.143878 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141977 evaluated Put command ‹header:<key:"\357\211\211\210" sequence:2 > value:<raw_bytes:"m#6$\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\214\207\001\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\21...›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=2} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.188955 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141985 evaluated QueryIntent command ‹header:<key:"\357\211\211\210" > txn:<id:c6e5306d-a19f-4bcf-8a41-4fa41b81aa36 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448973936000 logical:1 > min_timestamp:<wall_time:1634592448952449000 > priority:73935 sequence:2 > error_if_missing:t...›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<> found_intent:true›, err=<nil>
I211018 21:27:29.189175 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141986 evaluated Get command ‹header:<key:"\357\211\211\210" sequence:2 >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=2} lock=true stat=PENDING rts=1634592448.952449000,0 wto=false gul=1634592449.452449000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"m#6$\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\214\207\001\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\214\207\001"...›, err=<nil>
I211018 21:27:29.210480 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141990 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448952449000 >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.973936000,1 wto=false gul=1634592449.452449000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.210593 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141991 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448952449000 >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.973936000,1 wto=false gul=1634592449.452449000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.210694 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141992 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448952449000 >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.973936000,1 wto=false gul=1634592449.452449000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.210781 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141993 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448952449000 >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.973936000,1 wto=false gul=1634592449.452449000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.210866 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141994 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448952449000 >›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=0} lock=true stat=PENDING rts=1634592448.973936000,1 wto=false gul=1634592449.452449000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.211149 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141995 evaluated EndTxn command ‹header:<key:"\357\211\211\210" sequence:3 > commit:true deadline:<wall_time:1634592710654402320 > lock_spans:<key:"\357\211\211\210" > txn_heartbeating:true›, txn="sql txn" meta={id=c6e5306d pri=0.00344287 epo=0 ts=1634592448.973936000,1 min=1634592448.952449000,0 seq=3} lock=true stat=PENDING rts=1634592448.973936000,1 wto=false gul=1634592449.452449000,0 : resp=‹header:<txn:<meta:<id:c6e5306d-a19f-4bcf-8a41-4fa41b81aa36 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448973936000 logical:1 > min_timestamp:<wall_time:1634592448952449000 > priority:73935 sequence:3 > name:"sql txn" status:COMMITTED last_...›, err=<nil>
I211018 21:27:29.275773 121221 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141999 evaluated PushTxn command ‹header:<key:"\357\211\211\210" > pusher_txn:<meta:<id:62ac322d-5edb-496f-9be4-909a47f61540 write_timestamp:<wall_time:1634592449154443000 > min_timestamp:<wall_time:1634592449154443000 > priority:139364 > name:"sql txn" last_heartbeat:<wall_time:1634592...›, txn=<nil> : resp=‹header:<> pushee_txn:<meta:<id:c6e5306d-a19f-4bcf-8a41-4fa41b81aa36 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448973936000 logical:1 > min_timestamp:<wall_time:1634592448952449000 > priority:73935 sequence:2 > status:ABORTED last_heartbea...›, err=<nil>
I211018 21:27:29.282255 121225 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142000 evaluated ResolveIntent command ‹header:<key:"\357\211\211\210" > intent_txn:<id:c6e5306d-a19f-4bcf-8a41-4fa41b81aa36 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448973936000 logical:1 > min_timestamp:<wall_time:1634592448952449000 > priority:73935 sequence:2 > status:ABOR...›, txn=<nil> : resp=‹header:<>›, err=<nil>

# third txn
I211018 21:27:29.382816 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142022 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=d43a2fcf pri=0.00201333 epo=0 ts=1634592449.381285000,0 min=1634592449.381285000,0 seq=0} lock=true stat=PENDING rts=1634592449.381285000,0 wto=false gul=1634592449.881285000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"m#6$\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\214\207\001\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\220\315\214\207\001"...›, err=<nil>
I211018 21:27:29.383091 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142024 evaluated Put command ‹header:<key:"\357\211\211\210" sequence:1 > value:<raw_bytes:"\213O\325\270\n&\005David#\200\265\0308\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014\320\300\352\352\002\030\254\232\357\226\014\260\255\210\255\005\030\202\233\357\226\014...›, txn="sql txn" meta={id=d43a2fcf pri=0.00201333 epo=0 ts=1634592449.381285000,0 min=1634592449.381285000,0 seq=1} lock=true stat=PENDING rts=1634592449.381285000,0 wto=false gul=1634592449.881285000,0 : resp=‹header:<>›, err=<nil>
I211018 21:27:29.384127 120256 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142026 evaluated EndTxn command ‹header:<key:"\357\211\211\210" sequence:2 > commit:true deadline:<wall_time:1634592710654402320 > lock_spans:<key:"\357\211\211\210" > in_flight_writes:<key:"\357\211\211\210" sequence:1 > txn_heartbeating:true›, txn="sql txn" meta={id=d43a2fcf pri=0.00201333 epo=0 ts=1634592449.381285000,0 min=1634592449.381285000,0 seq=2} lock=true stat=PENDING rts=1634592449.381285000,0 wto=false gul=1634592449.881285000,0 : resp=‹header:<txn:<meta:<id:d43a2fcf-e218-4cb2-83fb-d36296127c97 key:"\357\211\211\210" write_timestamp:<wall_time:1634592449381285000 > min_timestamp:<wall_time:1634592449381285000 > priority:43236 sequence:2 > name:"sql txn" status:STAGING last_heartbeat:<w...›, err=<nil>
I211018 21:27:29.404330 119563 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142029 evaluated QueryIntent command ‹header:<key:"\357\211\211\210" > txn:<id:d43a2fcf-e218-4cb2-83fb-d36296127c97 key:"\357\211\211\210" write_timestamp:<wall_time:1634592449381285000 > min_timestamp:<wall_time:1634592449381285000 > priority:43236 sequence:1 > error_if_missing:true›, txn="sql txn" meta={id=d43a2fcf pri=0.00201333 epo=0 ts=1634592449.381285000,0 min=1634592449.381285000,0 seq=0} lock=true stat=PENDING rts=1634592449.381285000,0 wto=false gul=1634592449.881285000,0 : resp=‹header:<> found_intent:true›, err=<nil>
I211018 21:27:29.444995 121227 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142031 evaluated EndTxn command ‹header:<key:"\357\211\211\210" > commit:true lock_spans:<key:"\357\211\211\210" >›, txn="sql txn" meta={id=d43a2fcf pri=0.00201333 epo=0 ts=1634592449.381285000,0 min=1634592449.381285000,0 seq=0} lock=true stat=STAGING rts=1634592449.381285000,0 wto=false gul=1634592449.881285000,0 int=1 ifw=1 : resp=‹header:<txn:<meta:<id:d43a2fcf-e218-4cb2-83fb-d36296127c97 key:"\357\211\211\210" write_timestamp:<wall_time:1634592449381285000 > min_timestamp:<wall_time:1634592449381285000 > priority:43236 sequence:2 > status:COMMITTED last_heartbeat:<wall_time:1634...›, err=<nil>
I211018 21:27:29.491470 121289 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 142035 evaluated ResolveIntent command ‹header:<key:"\357\211\211\210" > intent_txn:<id:d43a2fcf-e218-4cb2-83fb-d36296127c97 key:"\357\211\211\210" write_timestamp:<wall_time:1634592449381285000 > min_timestamp:<wall_time:1634592449381285000 > priority:43236 sequence:2 > status:COMMITTED›, txn=<nil> : resp=‹header:<>›, err=<nil>

The first two txns look like we would expect. Notice that they commit with seq num 3, meaning that they performed 2 writes (plus the EndTxn's seq num). The third txn commits with a seq num of 2, so it only performed a single write! That indicates that something strange is going on. Are we sure we're issuing two UPDATE statements for this third txn?

And there's more. Notice that the third txn has two EndTxn requests while the first two only have one. That's because the third commit is performing a parallel commit, because its Put is still in flight by the time the txn commits. This is not the case for the first two txns because they read their write (# Expect original salary) and create a pipeline stall. So this indicates that the txn in question never even got to the # Expect original salary step, and was instead committed immediately after its first UPDATE.

nvanbenschoten commented 3 years ago

Would someone mind adding in a sanity check that we're actually reaching and then making it past the second dev.save! call?

nvanbenschoten commented 3 years ago

Also, as another sanity check, when people are saying that they reproduced in the shell, are they seeing a salary of 200000 or of anything that is not 80000? I ask because the first time these steps are run, they are expected to return a salary of 70000 if the read is serialized before the write, because 70000 is the default value for the column.

keithdoggett commented 3 years ago

I saw 200000 when I reproduced it in the shell and all of the test failures I saw had it as 200000.

nvanbenschoten commented 3 years ago

I think I found the first half of this transaction:

I211018 21:27:28.851620 100349 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141879 evaluated Get command ‹header:<key:"\357\211\211\210" >›, txn="sql txn" meta={id=21feff13 pri=0.00545341 epo=0 ts=1634592448.835827000,0 min=1634592448.835827000,0 seq=0} lock=false stat=PENDING rts=1634592448.835827000,0 wto=false gul=1634592449.335827000,0 : resp=‹header:<num_keys:1 num_bytes:60 > value:<raw_bytes:"\331\315\033\035\n&\005David#\200\342\t8\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\210\255\005\030\254\232\357\226\014\260\255\...›, err=<nil>
I211018 21:27:28.911843 100349 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141890 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=21feff13 pri=0.00545341 epo=0 ts=1634592448.835827000,0 min=1634592448.835827000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,0 wto=false gul=1634592449.335827000,0 : resp=‹header:<>›, err=‹conflicting intents on /Table/103/1/1/0›
I211018 21:27:28.994699 100349 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141921 evaluated PushTxn command ‹header:<key:"\357\211\211\210" > pusher_txn:<meta:<id:21feff13-cf87-4301-ad4d-717c687d7a35 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 > min_timestamp:<wall_time:1634592448835827000 > priority:117111 > name:"sql txn" last_heart...›, txn=<nil> : resp=‹header:<> pushee_txn:<meta:<id:5b957646-f4ae-43f7-abdc-eb6c3b948a55 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 logical:1 > min_timestamp:<wall_time:1634592448835576000 > priority:995875 sequence:1 > last_heartbeat:<wall_time:1...›, err=‹failed to push meta={id=5b957646 key=/Table/103/1/1/0 pri=0.04637404 epo=0 ts=1634592448.835827000,1 min=1634592448.835576000,0 seq=1} lock=true stat=PENDING rts=0,0 wto=false gul=0,0›
I211018 21:27:29.045930 121104 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141938 evaluated QueryTxn command ‹header:<key:"\357\211\211\210" > txn:<id:21feff13-cf87-4301-ad4d-717c687d7a35 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 > min_timestamp:<wall_time:1634592448835827000 > priority:117111 > wait_for_update:true›, txn=<nil> : resp=‹header:<> queried_txn:<meta:<id:21feff13-cf87-4301-ad4d-717c687d7a35 key:"\357\211\211\210" write_timestamp:<wall_time:1634592448835827000 > min_timestamp:<wall_time:1634592448835827000 > priority:117111 > last_heartbeat:<wall_time:1634592448835827000 >...›, err=<nil>
I211018 21:27:29.063822 100349 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141943 evaluated Get command ‹header:<key:"\357\211\211\210" > key_locking:Exclusive›, txn="sql txn" meta={id=21feff13 pri=0.00545341 epo=0 ts=1634592448.835827000,0 min=1634592448.835827000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,0 wto=false gul=1634592449.335827000,0 : resp=‹header:<>›, err=‹WriteTooOldError: write at timestamp 1634592448.835827000,0 too old; wrote at 1634592448.835827000,2›
I211018 21:27:29.064094 100349 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141945 evaluated Refresh command ‹header:<key:"\357\211\211\210" > refresh_from:<wall_time:1634592448835827000 >›, txn="sql txn" meta={id=21feff13 pri=0.00545341 epo=0 ts=1634592448.835827000,2 min=1634592448.835827000,0 seq=0} lock=true stat=PENDING rts=1634592448.835827000,2 wto=false gul=1634592449.335827000,0 : resp=‹header:<>›, err=encountered recently written key ‹/Table/103/1/1/0› @1634592448.835827000,1
I211018 21:27:29.121109 121146 kv/kvserver/replica_evaluate.go:521 ⋮ [n1,s1,r140/1:‹/Table/10{3-4}›] 141971 evaluated EndTxn command ‹header:<key:"\357\211\211\210" sequence:1 > lock_spans:<key:"\357\211\211\210" > txn_heartbeating:true›, txn="sql txn" meta={id=21feff13 pri=0.01917826 epo=1 ts=1634592448.835827000,2 min=1634592448.835827000,0 seq=1} lock=true stat=PENDING rts=1634592448.835827000,2 wto=false gul=1634592449.335827000,0 : resp=‹header:<txn:<meta:<id:21feff13-cf87-4301-ad4d-717c687d7a35 key:"\357\211\211\210" epoch:1 write_timestamp:<wall_time:1634592448835827000 logical:2 > min_timestamp:<wall_time:1634592448835827000 > priority:411850 sequence:1 > name:"sql txn" status:ABORTE...›, err=<nil>

Notice that it hits a WriteTooOld error on the first UPDATE statement's locking Get, can't refresh, and then gets rolled back. I wonder if this somehow (on the client or server) broke the txn context and then allowed the UPDATE statement to retry and commit in a separate txn.

nvanbenschoten commented 3 years ago

This timeline appears to be consistent with the first debug.zip @rafiss sent over.

otan commented 3 years ago

@keithdoggett i'm having trouble reproducing this on shell again. i did see 200000

can you try, and paste the output of the shell as you are running the commands?

otan commented 3 years ago

huh, i misspoke.

looks like this is a "problem" on 21.1 as well. https://github.com/cockroachdb/cockroach/issues/69080 Aug 24, master first hit Aug 12.

rafiss commented 3 years ago

Sharing something strange I see in my repro from running ActiveRecord tests locally.

I'm using RUBYOPT="-W0" TESTOPTS="-v" TEST_FILES=test/cases/transactions_test.rb bundle exec rake test

The full logs are long and pasted below, but the strange part is I see:

ConcurrentTransactionTest#test_transaction_per_thread = 0.01 s = F

Failure:
ConcurrentTransactionTest#test_transaction_per_thread [/Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:76]:
Expected: 80000
  Actual: 200000

as well as

ConcurrentTransactionTest#test_transaction_isolation__read_committed = 31.59 s = F

Failure:
ConcurrentTransactionTest#test_transaction_isolation__read_committed [/Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:50]:
Expected: 80000
  Actual: 200000

However, I'm using the code I shared in https://github.com/cockroachdb/cockroach/issues/71652#issuecomment-946118074 -- and both line 50 and line 76 are from test_transaction_isolation__read_committed. Yet somehow the test suite is saying that line 76 caused test_transaction_per_thread to fail. That's weird and makes me wonder if the concurrency is confusing the test suite itself.

``` ❯ RUBYOPT="-W0" TESTOPTS="-v" TEST_FILES=test/cases/transactions_test.rb bundle exec rake test Using cockroachdb Run options: -v --seed 41743 # Running: # terminated with exception (report_on_exception is true): Traceback (most recent call last): 59: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:47:in `block (3 levels) in test_transaction_isolation__read_committed' 58: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:209:in `transaction' 57: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `transaction' 56: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb:12:in `within_new_transaction' 55: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction' 54: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize' 53: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt' 52: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize' 51: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt' 50: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize' 49: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction' 48: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `block in transaction' 47: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:53:in `block (4 levels) in test_transaction_isolation__read_committed' 46: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/suppressor.rb:48:in `save!' 45: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:302:in `save!' 44: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:350:in `with_transaction_returning_status' 43: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in `transaction' 42: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status' 41: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:302:in `block in save!' 40: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/validations.rb:53:in `save!' 39: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:507:in `save!' 38: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/timestamp.rb:126:in `create_or_update' 37: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:457:in `create_or_update' 36: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:824:in `_run_save_callbacks' 35: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:137:in `run_callbacks' 34: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:126:in `block in run_callbacks' 33: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/autosave_association.rb:385:in `around_save_collection_association' 32: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:117:in `block in run_callbacks' 31: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:457:in `block in create_or_update' 30: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:900:in `create_or_update' 29: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/timestamp.rb:121:in `_update_record' 28: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:465:in `_update_record' 27: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:824:in `_run_update_callbacks' 26: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:106:in `run_callbacks' 25: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:465:in `block in _update_record' 24: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/attribute_methods/dirty.rb:195:in `_update_record' 23: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:913:in `_update_record' 22: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/locking/optimistic.rb:88:in `_update_row' 21: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:891:in `_update_row' 20: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:385:in `_update_record' 19: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb:22:in `update' 18: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:179:in `update' 17: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb:70:in `exec_delete' 16: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:649:in `execute_and_clear' 15: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:670:in `exec_no_cache' 14: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:687:in `log' 13: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/notifications/instrumenter.rb:24:in `instrument' 12: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:695:in `block in log' 11: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize' 10: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt' 9: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize' 8: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt' 7: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize' 6: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:696:in `block (2 levels) in log' 5: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:671:in `block in exec_no_cache' 4: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads' 3: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares' 2: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads' 1: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `block (2 levels) in exec_no_cache' /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `exec_params': ERROR: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1634604180.165727000,0 too old; wrote at 1634604180.166592000,2: "sql txn" meta={id=79057b8e pri=0.00185245 epo=0 ts=1634604180.166592000,2 min=1634604180.165727000,0 seq=0} lock=true stat=PENDING rts=1634604180.165727000,0 wto=false gul=1634604180.665727000,0 (PG::TRSerializationFailure) HINT: See: https://www.cockroachlabs.com/docs/v21.2/transaction-retry-error-reference.html 59: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:47:in `block (3 levels) in test_transaction_isolation__read_committed' 58: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:209:in `transaction' 57: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `transaction' 56: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb:12:in `within_new_transaction' 55: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction' 54: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize' 53: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt' 52: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize' 51: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt' 50: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize' 49: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction' 48: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `block in transaction' 47: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:53:in `block (4 levels) in test_transaction_isolation__read_committed' 46: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/suppressor.rb:48:in `save!' 45: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:302:in `save!' 44: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:350:in `with_transaction_returning_status' 43: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:318:in `transaction' 42: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:354:in `block in with_transaction_returning_status' 41: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:302:in `block in save!' 40: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/validations.rb:53:in `save!' 39: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:507:in `save!' 38: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/timestamp.rb:126:in `create_or_update' 37: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:457:in `create_or_update' 36: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:824:in `_run_save_callbacks' 35: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:137:in `run_callbacks' 34: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:126:in `block in run_callbacks' 33: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/autosave_association.rb:385:in `around_save_collection_association' 32: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:117:in `block in run_callbacks' 31: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:457:in `block in create_or_update' 30: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:900:in `create_or_update' 29: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/timestamp.rb:121:in `_update_record' 28: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:465:in `_update_record' 27: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:824:in `_run_update_callbacks' 26: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/callbacks.rb:106:in `run_callbacks' 25: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/callbacks.rb:465:in `block in _update_record' 24: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/attribute_methods/dirty.rb:195:in `_update_record' 23: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:913:in `_update_record' 22: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/locking/optimistic.rb:88:in `_update_row' 21: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:891:in `_update_row' 20: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/persistence.rb:385:in `_update_record' 19: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/query_cache.rb:22:in `update' 18: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:179:in `update' 17: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql/database_statements.rb:70:in `exec_delete' 16: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:649:in `execute_and_clear' 15: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:670:in `exec_no_cache' 14: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:687:in `log' 13: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/notifications/instrumenter.rb:24:in `instrument' 12: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:695:in `block in log' 11: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize' 10: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt' 9: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize' 8: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt' 7: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize' 6: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb:696:in `block (2 levels) in log' 5: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:671:in `block in exec_no_cache' 4: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/dependencies/interlock.rb:47:in `permit_concurrent_loads' 3: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/share_lock.rb:187:in `yield_shares' 2: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/dependencies/interlock.rb:48:in `block in permit_concurrent_loads' 1: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `block (2 levels) in exec_no_cache' /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/postgresql_adapter.rb:672:in `exec_params': PG::TRSerializationFailure: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: WriteTooOldError: write at timestamp 1634604180.165727000,0 too old; wrote at 1634604180.166592000,2: "sql txn" meta={id=79057b8e pri=0.00185245 epo=0 ts=1634604180.166592000,2 min=1634604180.165727000,0 seq=0} lock=true stat=PENDING rts=1634604180.165727000,0 wto=false gul=1634604180.665727000,0 (ActiveRecord::SerializationFailure) HINT: See: https://www.cockroachlabs.com/docs/v21.2/transaction-retry-error-reference.html 17: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:47:in `block (3 levels) in test_transaction_isolation__read_committed' 16: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:209:in `transaction' 15: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `transaction' 14: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb:11:in `within_new_transaction' 13: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb:20:in `rescue in within_new_transaction' 12: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb:12:in `within_new_transaction' 11: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction' 10: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize' 9: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt' 8: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize' 7: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt' 6: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize' 5: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction' 4: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb:20:in `block in within_new_transaction' 3: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `block in transaction' 2: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:50:in `block (4 levels) in test_transaction_isolation__read_committed' 1: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/minitest-5.14.0/lib/minitest/assertions.rb:218:in `assert_equal' /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/minitest-5.14.0/lib/minitest/assertions.rb:183:in `assert': Expected: 80000 (Minitest::Assertion) Actual: 200000 ConcurrentTransactionTest#test_transaction_isolation__read_committed = 31.59 s = F Failure: ConcurrentTransactionTest#test_transaction_isolation__read_committed [/Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:50]: Expected: 80000 Actual: 200000 rails test Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:39 # terminated with exception (report_on_exception is true): Traceback (most recent call last): 16: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:72:in `block (2 levels) in test_transaction_isolation__read_committed' 15: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:72:in `times' 14: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:74:in `block (3 levels) in test_transaction_isolation__read_committed' 13: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/transactions.rb:209:in `transaction' 12: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `transaction' 11: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb:12:in `within_new_transaction' 10: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:317:in `within_new_transaction' 9: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `synchronize' 8: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:21:in `handle_interrupt' 7: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `block in synchronize' 6: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:25:in `handle_interrupt' 5: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activesupport/lib/active_support/concurrency/load_interlock_aware_monitor.rb:26:in `block (2 levels) in synchronize' 4: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb:319:in `block in within_new_transaction' 3: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/bundler/gems/rails-0863d22660c2/activerecord/lib/active_record/connection_adapters/abstract/database_statements.rb:320:in `block in transaction' 2: from /Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:76:in `block (4 levels) in test_transaction_isolation__read_committed' 1: from /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/minitest-5.14.0/lib/minitest/assertions.rb:218:in `assert_equal' /Users/rafiss/.rbenv/versions/2.7.4/lib/ruby/gems/2.7.0/gems/minitest-5.14.0/lib/minitest/assertions.rb:183:in `assert': Expected: 80000 (Minitest::Assertion) Actual: 200000 ConcurrentTransactionTest#test_transaction_per_thread = 0.01 s = F Failure: ConcurrentTransactionTest#test_transaction_per_thread [/Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:76]: Expected: 80000 Actual: 200000 rails test Users/rafiss/scratch/activerecord-cockroachdb-adapter/test/cases/transactions_test.rb:21 Finished in 31.595065s, 0.0633 runs/s, 0.4115 assertions/s. 2 runs, 13 assertions, 2 failures, 0 errors, 0 skips rake aborted! Command failed with status (1) /Users/rafiss/.rbenv/versions/2.7.4/bin/bundle:23:in `load' /Users/rafiss/.rbenv/versions/2.7.4/bin/bundle:23:in `
' Tasks: TOP => test => test:cockroachdb ```
rafiss commented 3 years ago

After @otan pointed out that release-21.1 is affected too, I tried against v21.1.7 and reproduced the failure. v21.1.7 was released on August 9, before this test first failed on master on August 13.

(And an update on my bisecting attempts: they've been inconclusive since as per above, this issue goes back further than the commits I've been bisecting.)

So I feel comfortable calling this not a release-blocker. But let's keep the issue open as a tracking issue for SQL Experience so we can keep debugging then ultimately decide if there is a CRDB bug or issue in the test infra.

update: repro'd on v21.1.0 also

bdarnell commented 3 years ago

The third txn commits with a seq num of 2, so it only performed a single write! That indicates that something strange is going on. Are we sure we're issuing two UPDATE statements for this third txn?

This reminds me of a bug I saw in another life (10 years ago, in a python/mysql app). Some confused exception handling (in this case relating to the differences between plain except: and except Exception:) ended up putting connections that experienced certain errors back into the connection pool still in the "pending transaction" state. Their dirty writes could then end up committed if the next user of the connection from the pool performed a commit.

I started this comment in response to Nathan's analysis, and then Rafi's comment showing cross-test effects also suggests the connection pool as a possible vector (assuming the ActiveRecord test framework doesn't clear the connection pool between every test).

nvanbenschoten commented 3 years ago

One theory I floated in Slack that could explain these "shell reproductions" is that they were run on the same DB instance that had previously been corrupted (i.e. salary = 200000) by an activerecord test suite run. The shell commands listed above will often lead to the write txn getting pushed, so the read txn will read below it and observe the preceding state. If the previous state was the one corrupted by activerecord, the shell could show 200000 on the first try.

I'm curious whether this explanation matches @otan and @keithdoggett's reproductions. Has anyone ever reproduced this in a shell on a fresh instance where they actually had to run the CREATE TABLE statement themselves? Or ever reproduced this on any attempt but the first in a shell?

otan commented 3 years ago

Or ever reproduced this on any attempt but the first in a shell?

I first thought I got it on a shell (AR was not setup), but I am really wondering whether my manual testing accidentally made a commit instead. Curious how @keithdoggett got it on his end.

As they say, pics or it didn't happen (and I don't have pics).

otan commented 3 years ago

I guess it is also suspicious in my repro using Go, I ended up with TxnProtoRefreshError frequently, but activerecord never hits that error. The AR test should, in theory, hit it as regularly as I did (3 write threads, 5 reads threads).

keithdoggett commented 3 years ago

That's a good point @nvanbenschoten, when I reproduced it in a shell, I had connected to the same database that the AR tests are run on. It was after the tests ran, but if it's some sort of data corruption issue that would make sense why I was able to see it on the first try.

To @bdarnell's point, the connection pool is not reset between tests and we've seen issues in the past where a connection will have an issue during a transaction and then every following test would raise a TxnProtoRefreshError. So it's possible that it's related to that, although there's usually an error that gets raised when executing in an invalid state which we're not seeing here.

rafiss commented 3 years ago

This upstream change seems very relevant to what's being discussed. @keithdoggett can we think of a way to make sure the test is using this logic correctly?

https://github.com/rails/rails/commit/a2c11689e635604147682463630da1777dfe2112

rafiss commented 3 years ago

Ah nvm rails/rails@a2c1168 is only in rails v7.0.0.alpha1. @keithdoggett can you determine if the rails project will get that change backported?

Based on the discussion in https://github.com/rails/rails/pull/40541 I feel pretty confident that this is the problem.

Also, let's make sure the adapter's retry logic is still correct: https://github.com/cockroachdb/activerecord-cockroachdb-adapter/blob/f4925929d25b5afc4b80b96e0043dfb60c8b8702/lib/active_record/connection_adapters/cockroachdb/transaction_manager.rb#L11-L29

keithdoggett commented 3 years ago

@rafiss this is on the 6-1-stable branch, so presumably will be rolled into the next release: https://github.com/rails/rails/blob/711b241dfd45a56323c647fc82ad7e574e3282c3/activerecord/lib/active_record/connection_adapters/abstract/transaction.rb#L330-L336

I'll take a look at the retry logic. Might have to add a check to confirm the connection hasn't been disconnected in super.

rafiss commented 3 years ago

Ah thanks for pointing me at that backport - I didn't see that it was merged into 6-1-stable

Though it looks like that commit has already been released:

❯ git tag --contains d6757d5d8fc25628e6e591ba3e74c9c9a34fe4ca
v6.1.0
v6.1.0.rc2
v6.1.1
v6.1.2
v6.1.2.1
v6.1.3
v6.1.3.1
v6.1.3.2
v6.1.4
v6.1.4.1

So I think it likely means one of two things:

A next step is for us to write an activerecord-cockroachdb-adapter test that intentionally causes transaction retry errors. If we're running against v22.1 we can use the inject_retry_errors_enabled session variable that we added in https://github.com/cockroachdb/cockroach/pull/71357

keithdoggett commented 3 years ago

Gotcha so we've been running against this for a little while. The within_new_transaction method wasn't updated for our 6.1 update, so I'm guessing our version is out of date now.

We're not currently running against 22.1, but I could make a separate branch to test that or try to come up with some code that would trigger a transaction retry error.

rafiss commented 3 years ago

well I wrote this simple test and tried locally with v22.1, and it seems to work correctly

    def test_txn_retry
      ActiveRecord::Base.connection.execute("SET inject_retry_errors_enabled = true")
      i = 0
      ActiveRecord::Base.transaction do
        i += 1
        puts "i is #{i}"
        if i >= 2
          ActiveRecord::Base.connection.execute("SET inject_retry_errors_enabled = false")
        end
        ActiveRecord::Base.connection.execute("SELECT 1")
      end
      assert i == 2
    end

I think we can probably hold off on merging in a test to activerecord-cockroachdb-adapter right now.

I guess I no longer have a great theory for where to look next. Maybe there are other issues that could happen with the retry logic. I'll try out other tests.

otan commented 3 years ago

and it seems to work correctly

it could be working correctly but still using a poisoned connection holding an uncommitted txn though? would a fairer test be:

CREATE TABLE t (pk INT PRIMARY KEY, b int);
INSERT INTO t VALUES (1, 1);

for each pool conn + 1 infinite loop {
     i = 0
       ActiveRecord::Base.transaction do
        i += 1
        puts "i is #{i}"
        if i >= 2
          ActiveRecord::Base.connection.execute("SET inject_retry_errors_enabled = false")        
          ActiveRecord::Base.connection.execute("UPDATE t SET b = 1 WHERE pk = 1")
        else
            ActiveRecord::Base.connection.execute("UPDATE t SET b = 2 WHERE pk = 1")
        end
      end
      refetch i and assert i == 1
}
rafiss commented 2 years ago

I've moved this issue to the activerecord-cockroachdb-adapter repo. It seems like an issue specific to the adapter.

@keithdoggett when you have time to get into the backlog, I think it would be valuable to look into this more.