Open erikgrinaker opened 1 year ago
cc @cockroachdb/replication
Rediscovered this issue on the 23.2 scale test cluster. Thread with details is here.
Trying to initialize the bank workload on our internal infrastructure i believe this is a bug with crdb workload. Let me open a separate issue. cc @pav-kv
(base) eyang@HQ-C02FC4D6MD6T scripts % cockroach workload init bank --rows 1000000000 --ranges 350000 --payload-bytes 10000 --data-loader IMPORT {URL"
I240318 23:01:38.484242 1 ccl/workloadccl/fixture.go:342 [-] 1 starting import of 1 tables
Error: importing fixture: importing table bank: pq: at or near "(": syntax error
noting some other observations on pausing node during testing - the workload interrupted with this message when i stopped a node
Error: pq: result is ambiguous: error=ba: Put [/Table/308/1/87236004/0], EndTxn(parallel commit) [/Table/308/1/87236004/0], [txn: 1234c4f3] RPC error: grpc: error reading from server: read tcp 10.138.46.12:49619->10.142.36.22:26856: use of closed network connection [code 14/Unavailable] [propagate] (last error: transaction 1234c4f3-f121-48ac-949d-d4aeb22b0a13 with sequence 1 prevented from changing write timestamp from 1710956484.440837718,0 to 1710956489.408392875,2 due to ambiguous replay protection
The smallest testing cluster I could find is a 30 node multi region cluster (3 dc 10 per dc). during testing I took out two nodes and rejoin them back after 20 mins their memory usage is significantly higher than the rest of nodes (go allocated 18gb vs 2gb) once they join and it slow down afterward. Captured some memory profiles will share on separate channel.
After digging through the /heap_profiler folder we found memory quickly claims from 1.5 GB from first profile to 10 GB last profile with 50 seconds, the 10.62kB allocation on raftpb.unmarshal is really growing. I have also observed during test this memory spike get flattened to 1gb normal level fairly quickly
the 10 gb allocation
first profile memprof.2024-03-20T19_16_03.306.9703810216.pprof.gz
last profile memprof.2024-03-20T19_16_53.251.19656000968.pprof.gz
Thanks @lyang24. The last profile does look like a repro of this issue. Did you observe other effects this has on the cluster? For example, higher tail latencies, Go scheduling latency, etc.
While working on #103288, I spun up a 3-node
n2-highcpu-8
cluster (8 vCPUS, 8 GB memory) with a bank workload writing 10 KB rows across 35k ranges. After some time, I took down one of the nodes for about 30 minutes. When I reintroduced it to the cluster, it continually OOMed on startup, with heap profiles showing all memory usage came from Raft request decoding (likely MsgApps kept around in the unstable log). Increasing memory from 8 GB to 32 GB was not sufficient to resolve the OOMs.Rough repro:
Let the workload run for 20 minutes. Stop one of the nodes, keep it down for 20 minutes, restart it.
35k ranges is probably excessive here, try e.g. 20k ranges. kv0 with large rows/batches probably does the trick too. The initial import here will take 5 hours, a smaller initial dataset probably works too.
Jira issue: CRDB-28990
Epic CRDB-39898