cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.01k stars 3.79k forks source link

roachtest: disagg-rebalance/aws/n4cpu4 failed #122483

Closed cockroach-teamcity closed 5 months ago

cockroach-teamcity commented 6 months ago

roachtest.disagg-rebalance/aws/n4cpu4 failed with artifacts on release-24.1 @ 83796af55775f4641055129caa7dba562d0ed8c9:

(disagg_rebalance.go:139).func1: unexpected snapshot received bytes 5526407 > bytes in all replicas on n4 5198716, did not do a disaggregated rebalance?
test artifacts and logs in: /artifacts/disagg-rebalance/aws/n4cpu4/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-37905

sumeerbhola commented 6 months ago

@itsbilal this looks similar to https://github.com/cockroachdb/cockroach/issues/121399 which you investigated and closed. Thoughts?

sumeerbhola commented 6 months ago

The ranges are evenly balanced

06:01:29 allocator.go:326: replica count stddev = 1.290994, max allowed stddev = 10.000000
06:01:29 allocator.go:202: cluster took 00:01:20.604196 to rebalance
06:01:29 allocator.go:213: 714 range events
06:01:29 allocator.go:224: stdDev(replica count) = 1.29
06:01:29 allocator.go:239: s1 has 75 ranges
06:01:29 allocator.go:239: s2 has 72 ranges
06:01:29 allocator.go:239: s3 has 74 ranges
06:01:29 allocator.go:239: s4 has 73 ranges
06:01:29 disagg_rebalance.go:137: [w9] got snapshot received bytes = 5.3 MiB, logical bytes in ranges = 5.0 MiB
06:01:29 test_impl.go:414: test failure #1: full stack retained in failure_1.log: (disagg_rebalance.go:139).func1: unexpected snapshot received bytes 5526407 > bytes in all replicas on n4 5198716, did not do a disaggregated rebalance?
sumeerbhola commented 6 months ago

Following looks reasonable

I240417 05:59:49.269181 720 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r68/4:‹/Table/{66-106}›] 77  applied snapshot 920f49fc from (n2,s2):2 at applied index 43 (total=428ms data=1.0 KiB excise=true ingestion=6@272ms)
I240417 05:59:49.906588 779 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r116/4:‹/{Table/114/1/…-Max}›] 87  applied snapshot 2ee089e4 from (n3,s3):2 at applied index 178 (total=833ms data=51 KiB shared=7 sharedSize=19 MiB excise=true ingestion=6@641ms)
I240417 05:59:50.319853 858 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r74/4:‹/Table/10{6/1-7/1}›] 92  applied snapshot 1fffb302 from (n1,s1):1 at applied index 1725 (total=373ms data=7.2 KiB shared=1 sharedSize=36 KiB excise=true ingestion=6@319ms)
I240417 05:59:50.571092 1297 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r20/4:‹/Table/1{7-8}›] 93  applied snapshot 856bb1a1 from (n2,s2):2 at applied index 32 (total=490ms data=908 B excise=true ingestion=6@365ms)
I240417 05:59:51.504425 1643 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r53/4:‹/Table/5{1-2}›] 96  applied snapshot 1c69eeb3 from (n3,s3):2 at applied index 33 (total=388ms data=908 B excise=true ingestion=6@325ms)
I240417 05:59:51.774811 1683 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r16/4:‹/Table/1{3-4}›] 97  applied snapshot 65d177f3 from (n1,s1):1 at applied index 475 (total=632ms data=13 KiB shared=1 sharedSize=84 KiB excise=true ingestion=6@523ms)
I240417 05:59:52.806333 2037 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r105/4:‹/Table/113/1/{9/30635-13/333…}›] 101  applied snapshot b257cf27 from (n1,s1):1 at applied index 322 (total=678ms data=105 KiB shared=10 sharedSize=26 MiB excise=true ingestion=6@613ms)
I240417 05:59:53.872115 2241 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r33/4:‹/NamespaceTable/{30-Max}›] 103  applied snapshot 966a4084 from (n1,s1):1 at applied index 49 (total=500ms data=893 B shared=1 sharedSize=15 KiB excise=true ingestion=6@443ms)
I240417 05:59:55.130278 2731 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r37/4:‹/Table/3{4-5}›] 106  applied snapshot 99118ca9 from (n2,s2):2 at applied index 33 (total=292ms data=908 B excise=true ingestion=6@239ms)
I240417 05:59:55.357526 2606 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r88/4:‹/Table/108/{1-2}›] 108  applied snapshot ee123ec0 from (n1,s1):1 at applied index 1208 (total=1205ms data=161 KiB shared=23 sharedSize=64 MiB excise=true ingestion=6@1154ms)
I240417 05:59:55.788694 2991 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r54/4:‹/Table/5{2-3}›] 109  applied snapshot 37bd4e94 from (n1,s1):1 at applied index 29 (total=397ms data=878 B excise=true ingestion=6@302ms)
I240417 05:59:57.713343 3507 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r83/4:‹/Table/1{09/1-10/1}›] 111  applied snapshot cc897b14 from (n1,s1):1 at applied index 1101 (total=482ms data=11 KiB shared=4 sharedSize=20 MiB excise=true ingestion=6@406ms)
I240417 05:59:58.590720 3702 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r39/4:‹/Table/3{6-7}›] 112  applied snapshot 46ddeacc from (n1,s1):1 at applied index 28 (total=297ms data=878 B excise=true ingestion=6@241ms)
I240417 05:59:58.853819 3692 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r35/4:‹/Table/3{2-3}›] 113  applied snapshot 543644de from (n3,s3):3 at applied index 33 (total=501ms data=908 B excise=true ingestion=6@347ms)
I240417 05:59:59.384685 3980 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r99/4:‹/Table/113/1/1{4/653…-5/973…}›] 116  applied snapshot 6ab62a61 from (n2,s2):2 at applied index 137 (total=526ms data=65 KiB shared=4 sharedSize=8.4 MiB excise=true ingestion=6@472ms)
I240417 05:59:59.522838 4107 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r22/4:‹/Table/{19-20}›] 118  applied snapshot 7a0cbd45 from (n3,s3):2 at applied index 32 (total=324ms data=908 B excise=true ingestion=6@228ms)
I240417 05:59:59.989758 4110 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r87/4:‹/Table/110/{1-2}›] 119  applied snapshot 819235b8 from (n1,s1):1 at applied index 1161 (total=602ms data=12 KiB shared=4 sharedSize=11 MiB excise=true ingestion=6@542ms)
I240417 06:00:00.780106 4462 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r78/4:‹/Table/114/1{-/0/1/-…}›] 122  applied snapshot 9682e995 from (n1,s1):1 at applied index 35 (total=529ms data=1.4 KiB shared=1 sharedSize=22 KiB excise=true ingestion=6@345ms)
I240417 06:00:02.578030 4878 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r57/4:‹/Table/5{5-6}›] 123  applied snapshot 41992b96 from (n2,s2):2 at applied index 34 (total=448ms data=908 B excise=true ingestion=6@355ms)
I240417 06:00:02.816370 4975 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r5/4:‹/{Systemtse-Table/0}›] 127  applied snapshot 8e8b23b7 from (n3,s3):3 at applied index 33 (total=656ms data=1.0 KiB excise=true ingestion=6@564ms)
I240417 06:00:03.796154 5344 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r113/4:‹/Table/114/1/1{0/7/-…-3/4/-…}›] 129  applied snapshot 4a0f8513 from (n3,s3):2 at applied index 238 (total=578ms data=2.4 KiB shared=5 sharedSize=19 MiB excise=true ingestion=6@520ms)
I240417 06:00:04.265275 5615 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r13/4:‹/Table/{9-11}›] 131  applied snapshot 7c2cfd51 from (n2,s2):2 at applied index 34 (total=297ms data=908 B excise=true ingestion=6@246ms)
I240417 06:00:04.812995 5797 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r90/4:‹/Table/10{8/2-9/1}›] 134  applied snapshot d96a1869 from (n1,s1):1 at applied index 32 (total=453ms data=906 B shared=2 sharedSize=12 MiB excise=true ingestion=6@389ms)
I240417 06:00:06.571935 6304 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r93/4:‹/Table/113/1/{0/1-1/31991}›] 136  applied snapshot 57e8f5a0 from (n3,s3):2 at applied index 162 (total=424ms data=11 KiB shared=4 sharedSize=8.5 MiB excise=true ingestion=6@371ms)
I240417 06:00:07.223762 6612 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r27/4:‹/Table/2{4-5}›] 140  applied snapshot b777d382 from (n2,s2):3 at applied index 33 (total=301ms data=908 B excise=true ingestion=6@221ms)
I240417 06:00:08.519730 6855 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r38/4:‹/Table/3{5-6}›] 143  applied snapshot bbb10956 from (n3,s3):2 at applied index 35 (total=364ms data=908 B excise=true ingestion=6@309ms)
I240417 06:00:10.530790 7399 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r42/4:‹/Table/{39-40}›] 147  applied snapshot 0dc0c4d4 from (n1,s1):1 at applied index 137 (total=375ms data=1.2 KiB shared=1 sharedSize=16 KiB excise=true ingestion=6@321ms)
I240417 06:00:11.451796 7708 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r40/4:‹/Table/3{7-8}›] 150  applied snapshot 07169590 from (n3,s3):2 at applied index 40 (total=348ms data=932 B shared=1 sharedSize=1.2 KiB excise=true ingestion=6@249ms)
I240417 06:00:12.228695 7992 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r19/4:‹/Table/1{6-7}›] 151  applied snapshot c7081f74 from (n2,s2):3 at applied index 32 (total=284ms data=908 B excise=true ingestion=6@225ms)
I240417 06:00:13.477296 8309 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r70/4:‹/Table/11{2/1-3/1}›] 152  applied snapshot c8c27528 from (n1,s1):1 at applied index 35 (total=312ms data=992 B shared=1 sharedSize=3.6 MiB excise=true ingestion=6@251ms)
I240417 06:00:14.297147 8646 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r26/4:‹/Table/2{3-4}›] 156  applied snapshot b574b780 from (n2,s2):2 at applied index 39 (total=420ms data=923 B shared=2 sharedSize=20 KiB excise=true ingestion=6@321ms)
I240417 06:00:15.485929 9067 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r63/4:‹/Table/6{1-2}›] 158  applied snapshot 13cc1b88 from (n3,s3):2 at applied index 32 (total=317ms data=908 B excise=true ingestion=6@210ms)
I240417 06:00:16.495787 9457 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r111/4:‹/Table/114/1/{6/7/-3…-8/7/-1…}›] 160  applied snapshot c0154171 from (n2,s2):2 at applied index 185 (total=578ms data=4.2 KiB shared=6 sharedSize=15 MiB excise=true ingestion=6@519ms)
I240417 06:00:16.961837 9602 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r108/4:‹/Table/114/1/{2/1/-2…-3/10/-…}›] 164  applied snapshot b742cb5a from (n1,s1):1 at applied index 154 (total=463ms data=3.6 KiB shared=4 sharedSize=14 MiB excise=true ingestion=6@366ms)
I240417 06:00:17.459108 10120 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r66/4:‹/Table/6{4-5}›] 165  applied snapshot 92c4dfab from (n2,s2):2 at applied index 34 (total=362ms data=908 B excise=true ingestion=6@265ms)
I240417 06:00:19.576057 10843 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r9/4:‹/Table/{5-6}›] 176  applied snapshot c46946de from (n1,s1):1 at applied index 43 (total=330ms data=893 B shared=1 sharedSize=14 KiB excise=true ingestion=6@276ms)
I240417 06:00:20.491982 11164 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r12/4:‹/Table/{8-9}›] 177  applied snapshot 47f7e4ab from (n3,s3):2 at applied index 33 (total=347ms data=920 B shared=1 sharedSize=899 B excise=true ingestion=6@286ms)
I240417 06:00:24.650035 12628 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r15/4:‹/Table/1{2-3}›] 181  applied snapshot 7c69823f from (n1,s1):1 at applied index 199 (total=367ms data=900 B shared=1 sharedSize=27 KiB excise=true ingestion=6@276ms)
I240417 06:00:27.737223 13612 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r14/4:‹/Table/1{1-2}›] 183  applied snapshot d9b7ce46 from (n3,s3):2 at applied index 751 (total=514ms data=1013 B shared=2 sharedSize=28 KiB excise=true ingestion=6@425ms)
I240417 06:00:28.319247 13954 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r46/4:‹/Table/4{3-4}›] 184  applied snapshot 3e50b391 from (n2,s2):2 at applied index 33 (total=366ms data=908 B excise=true ingestion=6@270ms)
I240417 06:00:29.528278 14520 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r75/4:‹/Table/113/1{-/0/1}›] 186  applied snapshot d9cbddc2 from (n2,s2):2 at applied index 37 (total=323ms data=1.1 KiB excise=true ingestion=6@270ms)
I240417 06:00:29.645830 14535 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r32/4:‹/{Table/29-NamespaceTab…}›] 187  applied snapshot 755fc3dd from (n3,s3):2 at applied index 31 (total=384ms data=908 B excise=true ingestion=6@296ms)
I240417 06:00:30.591718 14977 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r25/4:‹/Table/2{2-3}›] 189  applied snapshot 6dcc8ab9 from (n1,s1):1 at applied index 28 (total=353ms data=878 B excise=true ingestion=6@298ms)
I240417 06:00:31.337225 15360 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r49/4:‹/Table/4{6-7}›] 192  applied snapshot 75e259eb from (n2,s2):2 at applied index 38 (total=330ms data=922 B shared=2 sharedSize=14 KiB excise=true ingestion=6@267ms)
I240417 06:00:31.575128 15373 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r17/4:‹/Table/1{4-5}›] 193  applied snapshot 29fec193 from (n1,s1):1 at applied index 28 (total=372ms data=878 B excise=true ingestion=6@272ms)
I240417 06:00:32.521123 16018 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r58/4:‹/Table/5{6-7}›] 195  applied snapshot 7fa3edd1 from (n1,s1):1 at applied index 28 (total=309ms data=878 B excise=true ingestion=6@254ms)
I240417 06:00:34.492498 16729 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r55/4:‹/Table/5{3-4}›] 198  applied snapshot 33f21cd7 from (n1,s1):1 at applied index 29 (total=310ms data=879 B excise=true ingestion=6@249ms)
I240417 06:00:35.564192 17131 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r18/4:‹/Table/1{5-6}›] 199  applied snapshot 71d1bfcf from (n3,s3):2 at applied index 757 (total=421ms data=5.0 KiB shared=2 sharedSize=34 KiB excise=true ingestion=6@362ms)
I240417 06:00:37.756596 17728 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r92/4:‹/Table/11{1/1-2/1}›] 200  applied snapshot d129f63d from (n1,s1):1 at applied index 1599 (total=586ms data=4.3 KiB shared=4 sharedSize=2.2 MiB excise=true ingestion=6@527ms)
I240417 06:00:38.526456 18189 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r60/4:‹/Table/5{8-9}›] 202  applied snapshot eda9fd4b from (n1,s1):1 at applied index 28 (total=376ms data=878 B excise=true ingestion=6@281ms)
I240417 06:00:42.283109 19519 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r56/4:‹/Table/5{4-5}›] 205  applied snapshot dcf2568f from (n2,s2):2 at applied index 1732 (total=400ms data=940 B shared=2 sharedSize=132 KiB excise=true ingestion=6@301ms)
I240417 06:00:42.870366 19615 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r50/4:‹/Table/4{7-8}›] 206  applied snapshot 3a81f9d3 from (n3,s3):2 at applied index 39 (total=535ms data=932 B shared=1 sharedSize=6.7 KiB excise=true ingestion=6@435ms)
I240417 06:00:43.742633 20097 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r7/4:‹/Table/{3-4}›] 207  applied snapshot 54277e83 from (n1,s1):1 at applied index 192 (total=536ms data=906 B shared=1 sharedSize=30 KiB excise=true ingestion=6@423ms)
I240417 06:00:44.384158 20388 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r67/4:‹/Table/6{5-6}›] 208  applied snapshot a3c1266a from (n2,s2):2 at applied index 31 (total=503ms data=908 B excise=true ingestion=6@381ms)
I240417 06:00:46.169463 20977 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r45/4:‹/Table/4{2-3}›] 211  applied snapshot de4adf09 from (n2,s2):2 at applied index 33 (total=306ms data=908 B excise=true ingestion=6@243ms)
I240417 06:00:46.528796 21144 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r52/4:‹/Table/5{0-1}›] 212  applied snapshot f26535bd from (n1,s1):1 at applied index 28 (total=374ms data=878 B excise=true ingestion=6@304ms)
I240417 06:00:46.587531 21062 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r62/4:‹/Table/6{0-1}›] 213  applied snapshot 57d78de2 from (n3,s3):2 at applied index 32 (total=455ms data=908 B excise=true ingestion=6@344ms)
I240417 06:00:47.616067 21563 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r96/4:‹/Table/113/1/1{3/333…-4/653…}›] 216  applied snapshot c8615cdb from (n3,s3):2 at applied index 176 (total=479ms data=10 KiB shared=4 sharedSize=8.5 MiB excise=true ingestion=6@426ms)
I240417 06:00:48.570933 21910 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r3/4:‹/System/{NodeLive…-tsd}›] 218  applied snapshot ac280c73 from (n2,s2):3 at applied index 190 (total=394ms data=526 KiB excise=true ingestion=6@329ms)
I240417 06:00:48.975465 21916 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r24/4:‹/Table/2{1-2}›] 220  applied snapshot d54c1a02 from (n3,s3):3 at applied index 37 (total=369ms data=923 B shared=1 sharedSize=1.0 KiB excise=true ingestion=6@262ms)
I240417 06:00:49.562059 22806 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r36/4:‹/Table/3{3-4}›] 221  applied snapshot a691aac4 from (n1,s1):1 at applied index 30 (total=354ms data=878 B excise=true ingestion=6@301ms)
I240417 06:00:49.748120 22784 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r1/4:‹/{Min-System/NodeL…}›] 224  applied snapshot 356b3fc0 from (n3,s3):3 at applied index 1282 (total=548ms data=42 KiB excise=true ingestion=6@458ms)
I240417 06:00:53.591841 24078 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r79/4:‹/Table/10{7/1-8/1}›] 227  applied snapshot 7201a424 from (n1,s1):1 at applied index 3965 (total=413ms data=28 KiB shared=1 sharedSize=57 KiB excise=true ingestion=6@317ms)
I240417 06:00:54.536548 24673 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r28/4:‹/Table/2{5-6}›] 230  applied snapshot c9df446a from (n3,s3):3 at applied index 32 (total=320ms data=908 B excise=true ingestion=6@271ms)
I240417 06:00:58.524005 25793 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r29/4:‹/Table/2{6-7}›] 233  applied snapshot f858559a from (n1,s1):1 at applied index 44 (total=346ms data=893 B shared=1 sharedSize=18 KiB excise=true ingestion=6@246ms)
I240417 06:00:59.336681 26346 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r10/4:‹/Table/{6-7}›] 234  applied snapshot 4da841f5 from (n2,s2):2 at applied index 38 (total=344ms data=932 B shared=1 sharedSize=1.4 KiB excise=true ingestion=6@286ms)
I240417 06:00:59.715536 26442 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r31/4:‹/Table/2{8-9}›] 235  applied snapshot b2f281b9 from (n3,s3):2 at applied index 60 (total=341ms data=930 B shared=1 sharedSize=17 KiB excise=true ingestion=6@281ms)
I240417 06:01:01.671719 26961 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r102/4:‹/Table/11{3/1/15/9…-4/1}›] 236  applied snapshot de2de823 from (n2,s2):2 at applied index 406 (total=711ms data=4.3 KiB shared=10 sharedSize=26 MiB excise=true ingestion=6@659ms)
I240417 06:01:02.499970 27367 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r107/4:‹/Table/114/1/{0/1/-3…-2/1/-2…}›] 241  applied snapshot 5fdbd16b from (n2,s2):2 at applied index 230 (total=545ms data=2.5 KiB shared=4 sharedSize=14 MiB excise=true ingestion=6@436ms)
I240417 06:01:04.372653 28379 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r64/4:‹/Table/6{2-3}›] 244  applied snapshot cdcb47b6 from (n2,s2):2 at applied index 33 (total=375ms data=908 B excise=true ingestion=6@282ms)
I240417 06:01:05.598792 28994 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r51/4:‹/Table/{48-50}›] 246  applied snapshot c5cb8f9c from (n1,s1):1 at applied index 29 (total=344ms data=899 B shared=1 sharedSize=14 KiB excise=true ingestion=6@285ms)
I240417 06:01:07.789746 29760 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r4/4:‹/System{/tsd-tse}›] 247  applied snapshot 6b17c080 from (n1,s1):1 at applied index 189 (total=388ms data=4.0 MiB excise=true ingestion=6@328ms)
I240417 06:01:08.972985 30240 kv/kvserver/replica_raftstorage.go:607 ⋮ [T1,Vsystem,n4,s4,r106/4:‹/Table/113/1/{2/63981-6/66667}›] 248  applied snapshot 671a616c from (n3,s3):2 at applied index 469 (total=841ms data=44 KiB shared=10 sharedSize=26 MiB excise=true ingestion=6@786ms)
sumeerbhola commented 6 months ago

The range.snapshots.rcvd-bytes metric is looking at the KV batch bytes.

recordBytesReceived(int64(len(req.KVBatch)))

and we are comparing this with crdb_internal.kv_store_status.used which is StoreCapacity.Used, which is bytes in Pebble. This is a flawed check since the latter is compressed.

There is also something odd about StoreCapacity.Used being 5MB given

I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +      |                             |       |       |   ingested   |     moved    |    written   |       |    amp
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +level | tables  size val-bl vtables | score |   in  | tables  size | tables  size | tables  size |  read |   r   w
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +------+-----------------------------+-------+-------+--------------+--------------+--------------+-------+---------
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +    0 |     5   35KB   186B       3 |  3.03 | 1.1MB |    37  164KB |     0     0B |    17  301KB |    0B |   3  0.3
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +    1 |     0     0B     0B       0 |  0.00 |    0B |     0     0B |     0     0B |     0     0B |    0B |   0  0.0
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +    2 |     0     0B     0B       0 |  0.00 |    0B |     0     0B |     0     0B |     0     0B |    0B |   0  0.0
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +    3 |     0     0B     0B       0 |  0.00 |    0B |     0     0B |     0     0B |     0     0B |    0B |   0  0.0
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +    4 |     0     0B     0B       0 |  0.00 |    0B |     0     0B |     0     0B |     0     0B |    0B |   0  0.0
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +    5 |    14   21MB   35KB       4 |  0.33 | 567KB |    19   21MB |     3  5.5KB |    60   62MB |  62MB |   1  113
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +    6 |   111  167MB  3.7KB      51 |     - |  93KB |   148  167MB |     0     0B |    39   45KB |  93KB |   1  0.5
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +total |   130  188MB   39KB      58 |     - | 189MB |   204  188MB |     3  5.5KB |   116  252MB |  62MB |   5  1.3
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +-------------------------------------------------------------------------------------------------------------------
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +WAL: 1 files (54KB)  in: 1.1MB  written: 1.2MB (3% overhead)
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Flushes: 17
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Compactions: 129  estimated debt: 21MB  in progress: 1 (8.2MB)
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +             default: 48  delete: 1  elision: 77  move: 3  read: 0  rewrite: 0  multi-level: 0
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +MemTables: 1 (512KB)  zombie: 1 (512KB)
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Zombie tables: 8 (9.0MB, local: 29KB)
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Backing tables: 56 (178MB)
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Virtual tables: 58 (174MB)
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Local tables size: 67KB
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Block cache: 2.1K entries (53MB)  hit rate: 61.2%
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Table cache: 130 entries (98KB)  hit rate: 97.5%
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Secondary cache: 0 entries (0B)  hit rate: 0.0%
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Snapshots: 0  earliest seq num: 0
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Table iters: 12
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Filter utility: 94.9%
I240417 06:00:08.360799 251 kv/kvserver/store.go:3543 â‹® [T1,Vsystem,n4] 142 +Ingestions: 25  as flushable: 0 (0B in 0 tables)