cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.9k stars 3.78k forks source link

ccl/backupccl: TestRestoreAsOfSystemTimeGCBounds failed #130491

Open cockroach-teamcity opened 1 week ago

cockroach-teamcity commented 1 week ago

ccl/backupccl.TestRestoreAsOfSystemTimeGCBounds failed on release-24.1 @ bfc681c36fff728bc396c2d00f84a8c55a7c17af:

=== RUN   TestRestoreAsOfSystemTimeGCBounds
    test_log_scope.go:170: test logs captured to: outputs.zip/logTestRestoreAsOfSystemTimeGCBounds3390737136
    test_log_scope.go:81: use -show-logs to present logs inline
    test_server_shim.go:157: automatically injected a shared process virtual cluster under test; see comment at top of test_server_shim.go for details.
    backup_test.go:4021: error executing query="BACKUP data.bank TO $1 WITH revision_history" args=["nodelocal://1//tbl-after-gc"]: pq: failed to insert lease {[128] 3 1 1 {{552644000 63861648486 <nil>}} [1 1 128 4 38 248 229 85 82 69 241 181 111 55 193 204 189 40 234]}: unexpected value: raw_bytes:"\351\276\277\327\n3\002" timestamp:<wall_time:1726051397830154474 > 
    testutils.go:290: no Invalid Descriptors
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestRestoreAsOfSystemTimeGCBounds3390737136
--- FAIL: TestRestoreAsOfSystemTimeGCBounds (13.22s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-42089

cockroach-teamcity commented 5 days ago

ccl/backupccl.TestRestoreAsOfSystemTimeGCBounds failed on release-24.1 @ ab04f621a4c834c9f0cc7fd65187c531553ce384:

Fatal error:

panic: test timed out after 14m57s
running tests:
    TestRestoreAsOfSystemTimeGCBounds (14m57s)

Stack:

goroutine 499425 [running]:
testing.(*M).startAlarm.func1()
    GOROOT/src/testing/testing.go:2366 +0x385
created by time.goFunc
    GOROOT/src/time/sleep.go:177 +0x2d
Log preceding fatal error

``` === RUN TestRestoreAsOfSystemTimeGCBounds test_log_scope.go:170: test logs captured to: outputs.zip/logTestRestoreAsOfSystemTimeGCBounds3692411992 test_log_scope.go:81: use -show-logs to present logs inline test_server_shim.go:157: automatically injected a shared process virtual cluster under test; see comment at top of test_server_shim.go for details. ```

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

stevendanna commented 1 day ago

@cockroachdb/sql-foundations The first error here looks a lot like https://github.com/cockroachdb/cockroach/issues/129421

I poked at this a bit and came across https://github.com/cockroachdb/cockroach/pull/79511 which added a retry for something similar when writing via SQL. We could update that to look for ConditionFailedError, but it seems to me that this might be happening more frequently than we assumed was possible, so perhaps there is more to investigate.

rafiss commented 19 hours ago

For the first error

    backup_test.go:4021: error executing query="BACKUP data.bank TO $1 WITH revision_history" args=["nodelocal://1//tbl-after-gc"]: pq: failed to insert lease {[128] 3 1 1 {{552644000 63861648486 <nil>}} [1 1 128 4 38 248 229 85 82 69 241 181 111 55 193 204 189 40 234]}: unexpected value: raw_bytes:"\351\276\277\327\n3\002" timestamp:<wall_time:1726051397830154474 > 

~I tried decoding that value (I had to mess around with the backslash escaping):~

root@localhost:26257/defaultdb> select crdb_internal.pretty_value(decode('\351\276\277\327\n3\002', 'escape'));
ERROR: decode(): invalid bytea escape sequence

root@localhost:26257/defaultdb> select crdb_internal.pretty_value(decode('\351\276\277\327\\n3\002', 'escape'));
  crdb_internal.pretty_value
------------------------------
  /<err: unknown tag: 92>

~That looks like something coming out of grpc or protobuf code? not sure.~

Ignore the above, I just didn't figure out how to decode that value correctly.

Here's the actual value (replaced \n with the octal representation \012):

root@localhost:26257/defaultdb> SELECT crdb_internal.pretty_value(decode('\351\276\277\327\0123\002', 'escape'));;
  crdb_internal.pretty_value
------------------------------
  /TUPLE/3:3:Int/1
stevendanna commented 8 hours ago

Yeah, I think that makes sense since the leases table holds most of its data in the primary key, so I think the only column in the value is the instance ID.