cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

roachtest: tpccbench/nodes=3/cpu=16/lease=expiration failed [VM disappeared from roachprod cache] #130918

Closed cockroach-teamcity closed 1 month ago

cockroach-teamcity commented 1 month ago

roachtest.tpccbench/nodes=3/cpu=16/lease=expiration failed with artifacts on master @ 197c6ee5537ffb211ebd8dbcbe49edc6d5c710e1:

(test_runner.go:1440).func1: failed during post test assertions (see test-post-assertions.log): invalid node selector '1-4', cluster contains 3 nodes
test artifacts and logs in: /artifacts/tpccbench/nodes=3/cpu=16/lease=expiration/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for aws clusters

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-42293

herkolategan commented 1 month ago

Seems to be another case of (sync issue, last node disappeared): https://github.com/cockroachdb/cockroach/issues/130082

Cluster appears to have 4 nodes:

[w2] 2024/09/18 07:19:49 test_runner.go:764: Created new cluster for test tpccbench/nodes=3/cpu=16/lease=expiration: teamcity-16933157-1726638508-62-n4cpu16 (arch="amd64")
teamcity-16933157-1726638508-62-n4cpu16:[1 2 3 4]: updating Side-Eye agents with new environment name
   1:   <ok>
    Restarted.

   2:   <ok>
    Restarted.

   3:   <ok>
    Restarted.

   4:   <ok>
    Restarted.

But later on from roachprod state (only 3 VMs present):

{
  "name": "teamcity-16933157-1726638508-62-n4cpu16",
  "user": "teamcity",
  "created_at": "2024-09-18T07:18:26Z",
  "lifetime": 43200000000000,
  "vms": [
    {
      "name": "teamcity-16933157-1726638508-62-n4cpu16-0001",
      "created_at": "2024-09-18T07:18:26Z",
      "errors": null,
      "lifetime": 43200000000000,
      "preemptible": true,
      "labels": {
        "Cluster": "teamcity-16933157-1726638508-62-n4cpu16",
        "Created": "2024-09-18T07:18:24Z",
        "Lifetime": "12h0m0s",
        "Name": "teamcity-16933157-1726638508-62-n4cpu16-0001",
        "Roachprod": "true",
        "Spot": "true",
        "arch": "amd64",
        "test_name": "tpccbench-nodes-3-cpu-16-lease-expiration",
        "test_owner": "test-eng",
        "test_run_id": "teamcity-16933157",
        "usage": "roachtest"
      },
      "dns": "ip-10-12-11-110.us-east-2.compute.internal",
      "public_dns": "",
      "dns_provider": "",
      "provider": "aws",
      "provider_id": "i-03b7f0921bf95cd10",
      "private_ip": "10.12.11.110",
      "public_ip": "18.222.104.185",
      "remote_user": "ubuntu",
      "vpc": "vpc-0855bbb7081ef37de",
      "machine_type": "m6id.4xlarge",
      "cpu_architecture": "amd64",
      "cpu_family": "",
      "zone": "us-east-2a",
      "project": "",
      "non_bootable_volumes": null,
      "bootable_volume": {
        "ProviderResourceID": "",
        "ProviderVolumeType": "",
        "Zone": "",
        "Encrypted": false,
        "Name": "",
        "Labels": null,
        "Size": 0
      },
      "local_disks": null,
      "CostPerHour": 0,
      "EmptyCluster": false
    },
    {
      "name": "teamcity-16933157-1726638508-62-n4cpu16-0002",
      "created_at": "2024-09-18T07:18:26Z",
      "errors": null,
      "lifetime": 43200000000000,
      "preemptible": true,
      "labels": {
        "Cluster": "teamcity-16933157-1726638508-62-n4cpu16",
        "Created": "2024-09-18T07:18:24Z",
        "Lifetime": "12h0m0s",
        "Name": "teamcity-16933157-1726638508-62-n4cpu16-0002",
        "Roachprod": "true",
        "Spot": "true",
        "arch": "amd64",
        "test_name": "tpccbench-nodes-3-cpu-16-lease-expiration",
        "test_owner": "test-eng",
        "test_run_id": "teamcity-16933157",
        "usage": "roachtest"
      },
      "dns": "ip-10-12-9-161.us-east-2.compute.internal",
      "public_dns": "",
      "dns_provider": "",
      "provider": "aws",
      "provider_id": "i-06922f0b8895ae9c2",
      "private_ip": "10.12.9.161",
      "public_ip": "3.17.78.157",
      "remote_user": "ubuntu",
      "vpc": "vpc-0855bbb7081ef37de",
      "machine_type": "m6id.4xlarge",
      "cpu_architecture": "amd64",
      "cpu_family": "",
      "zone": "us-east-2a",
      "project": "",
      "non_bootable_volumes": null,
      "bootable_volume": {
        "ProviderResourceID": "",
        "ProviderVolumeType": "",
        "Zone": "",
        "Encrypted": false,
        "Name": "",
        "Labels": null,
        "Size": 0
      },
      "local_disks": null,
      "CostPerHour": 0,
      "EmptyCluster": false
    },
    {
      "name": "teamcity-16933157-1726638508-62-n4cpu16-0003",
      "created_at": "2024-09-18T07:18:27Z",
      "errors": null,
      "lifetime": 43200000000000,
      "preemptible": true,
      "labels": {
        "Cluster": "teamcity-16933157-1726638508-62-n4cpu16",
        "Created": "2024-09-18T07:18:24Z",
        "Lifetime": "12h0m0s",
        "Name": "teamcity-16933157-1726638508-62-n4cpu16-0003",
        "Roachprod": "true",
        "Spot": "true",
        "arch": "amd64",
        "test_name": "tpccbench-nodes-3-cpu-16-lease-expiration",
        "test_owner": "test-eng",
        "test_run_id": "teamcity-16933157",
        "usage": "roachtest"
      },
      "dns": "ip-10-12-2-43.us-east-2.compute.internal",
      "public_dns": "",
      "dns_provider": "",
      "provider": "aws",
      "provider_id": "i-0497a406acf98f98e",
      "private_ip": "10.12.2.43",
      "public_ip": "18.222.162.158",
      "remote_user": "ubuntu",
      "vpc": "vpc-0855bbb7081ef37de",
      "machine_type": "m6id.4xlarge",
      "cpu_architecture": "amd64",
      "cpu_family": "",
      "zone": "us-east-2a",
      "project": "",
      "non_bootable_volumes": null,
      "bootable_volume": {
        "ProviderResourceID": "",
        "ProviderVolumeType": "",
        "Zone": "",
        "Encrypted": false,
        "Name": "",
        "Labels": null,
        "Size": 0
      },
      "local_disks": null,
      "CostPerHour": 0,
      "EmptyCluster": false
    }
  ],
  "CostPerHour": 0
}
renatolabs commented 1 month ago

Closing as an instance of #130082.