cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.59k stars 3.7k forks source link

roachtest: clearrange/zfs/checks=true skipped [zfs support] #126777

Open cockroach-teamcity opened 2 weeks ago

cockroach-teamcity commented 2 weeks ago

roachtest.clearrange/zfs/checks=true failed with artifacts on release-23.2 @ d09da53496ac01269267a4d7d539ba6c840bbdb7:

(cluster.go:2344).Run: full command output in run_163600.466724730_n1_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/clearrange/zfs/checks=true/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

Jira issue: CRDB-40120

cockroach-teamcity commented 2 weeks ago

roachtest.clearrange/zfs/checks=true failed with artifacts on release-23.2 @ d09da53496ac01269267a4d7d539ba6c840bbdb7:

(cluster.go:2344).Run: full command output in run_163621.475297237_n1_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/clearrange/zfs/checks=true/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

jbowens commented 2 weeks ago

From the first failure:

  |   | Error: importing fixture: importing table bank: pq: pausing due to error; use RESUME JOB to try to proceed once the issue is resolved, or CANCEL JOB to rollback: store 10 has insufficient remaining capacity to ingest data (remaining: 939 MiB / 3.0%, min required: 5.0%)
jbowens commented 2 weeks ago

From two weeks ago: https://grafana.testeng.crdb.io/d/StorageAvKxELVz/storage?from=1719506351465&to=1719513577403&var-cluster=teamcity-15840932-1719467426-115-n10cpu16&orgId=1 Versus this failure: https://grafana.testeng.crdb.io/d/StorageAvKxELVz/storage?from=1720197323681&to=1720197862904&var-cluster=teamcity-15943253-1720158657-120-n10cpu16&orgId=1

jbowens commented 2 weeks ago
++ ls /dev/nvme0n1 /dev/nvme0n2 /dev/disk/by-id/google-persistent-disk-1
+ for d in $(ls /dev/nvme?n? /dev/disk/by-id/google-persistent-disk-[1-9])
+ zpool list -v -P
+ grep /dev/disk/by-id/google-persistent-disk-1
+ '[' 1 -ne 0 ']'
+ disks+=("${d}")
+ echo 'Disk /dev/disk/by-id/google-persistent-disk-1 not mounted, need to mount...'
Disk /dev/disk/by-id/google-persistent-disk-1 not mounted, need to mount...
+ for d in $(ls /dev/nvme?n? /dev/disk/by-id/google-persistent-disk-[1-9])
+ zpool list -v -P
+ grep /dev/nvme0n1
+ '[' 1 -ne 0 ']'
+ disks+=("${d}")
+ echo 'Disk /dev/nvme0n1 not mounted, need to mount...'
Disk /dev/nvme0n1 not mounted, need to mount...
+ for d in $(ls /dev/nvme?n? /dev/disk/by-id/google-persistent-disk-[1-9])
+ zpool list -v -P
+ grep /dev/nvme0n2
+ '[' 1 -ne 0 ']'
+ disks+=("${d}")
+ echo 'Disk /dev/nvme0n2 not mounted, need to mount...'
Disk /dev/nvme0n2 not mounted, need to mount...
+ '[' 3 -eq 0 ']'
+ '[' 3 -eq 1 ']'
+ '[' -n '' ']'
+ mountpoint=/mnt/data1
+ echo '3 disks mounted, creating /mnt/data1 using RAID 0'
3 disks mounted, creating /mnt/data1 using RAID 0
+ mkdir -p /mnt/data1
++ basename /mnt/data1
+ zpool create -f data1 -m /mnt/data1 /dev/disk/by-id/google-persistent-disk-1 /dev/nvme0n1 /dev/nvme0n2
/dev/nvme0n1 is in use and contains a unknown filesystem.
+ chmod 777 /mnt/data1
+ lsblk
NAME         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0          7:0    0  59.2M  1 loop /snap/core20/1977
loop1          7:1    0 245.8M  1 loop /snap/google-cloud-cli/158
loop2          7:2    0 109.6M  1 loop /snap/lxd/24326
loop3          7:3    0  46.4M  1 loop /snap/snapd/19459
nvme0n1      259:0    0    32G  0 disk
├─nvme0n1p1  259:1    0  31.9G  0 part /
└─nvme0n1p15 259:2    0    99M  0 part /boot/efi
nvme0n2      259:3    0   500G  0 disk
+ df -h
Filesystem       Size  Used Avail Use% Mounted on
/dev/root         31G  2.0G   29G   7% /
tmpfs             32G     0   32G   0% /dev/shm
tmpfs             13G  1.3M   13G   1% /run
tmpfs            5.0M     0  5.0M   0% /run/lock
/dev/nvme0n1p15   98M  6.3M   92M   7% /boot/efi

hrm, what happened with nvme0n1?

cockroach-teamcity commented 1 week ago

roachtest.clearrange/zfs/checks=true failed with artifacts on release-23.2 @ 7ca8340d8b1316144a9f0f53e736f168d99a0bab:

(cluster.go:2344).Run: full command output in run_173514.823167609_n1_cockroach-workload-f.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/clearrange/zfs/checks=true/cpu_arch=arm64/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!