elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
110 stars 126 forks source link

Integration tests framework creates more OGC VMs than needed #4930

Open belimawr opened 1 week ago

belimawr commented 1 week ago

Steps to reproduce

  1. Run a single integration test

    AGENT_KEEP_INSTALLED=true SNAPSHOT=true TEST_PLATFORMS="linux/amd64" mage integration:single TestContainerCMDWithAVeryLongStatePath
    Command output

    ``` >>>> Using ogc instance provisioner >>>> Using stateful stack provisioner >>> Creating zip archive of repo to send to remote hosts >>> Create SSH keys to use for SSH >>> Pulling latest ogc image >>> Creating cloud stack 8.15.0-SNAPSHOT [stack_id: 8150-SNAPSHOT] >>> Import layouts into ogc >>> Bring up instances through ogc >>> Created cloud stack 8.15.0-SNAPSHOT [stack_id: 8150-SNAPSHOT, deployment_id: 5e3a860b644e47c78177731786273117] >>> Waiting for cloud stack 8.15.0-SNAPSHOT to be ready [stack_id: 8150-SNAPSHOT, deployment_id: 5e3a860b644e47c78177731786273117] >>> (linux-amd64-ubuntu-2204-container) Starting SSH; connect with `ssh -i /home/tiago/devel/elastic-agent/.integration-cache/id_rsa ubuntu@34.27.44.2` >>> (linux-amd64-ubuntu-2204-container) ssh connect error: "error dialing tcp address \"34.27.44.2:22\" :dial tcp 34.27.44.2:22: connect: connection refused", will try again in 1s >>> (linux-amd64-ubuntu-2204-container) ssh connect error: "error dialing tcp address \"34.27.44.2:22\" :dial tcp 34.27.44.2:22: connect: connection refused", will try again in 2s >>> (linux-amd64-ubuntu-2204-container) ssh connect error: "error dialing tcp address \"34.27.44.2:22\" :dial tcp 34.27.44.2:22: connect: connection refused", will try again in 4s >>> (linux-amd64-ubuntu-2204-container) ssh connect error: "error dialing tcp address \"34.27.44.2:22\" :dial tcp 34.27.44.2:22: connect: connection refused", will try again in 8s >>> (linux-amd64-ubuntu-2204-container) ssh connect error: "error dialing tcp address \"34.27.44.2:22\" :dial tcp 34.27.44.2:22: connect: connection refused", will try again in 16s >>> (linux-amd64-ubuntu-2204-container) Connected over SSH >>> (linux-amd64-ubuntu-2204-container) Preparing instance >>> (linux-amd64-ubuntu-2204-container) Running apt-get update >>> (linux-amd64-ubuntu-2204-container) ssh exec error: "could not run \"sudo apt-get update -o APT::Update::Error-Mode=any\" though SSH: Process exited with status 100", will try again in 15s >>> (linux-amd64-ubuntu-2204-container) Install build-essential and unzip >>> (linux-amd64-ubuntu-2204-container) Install golang 1.21.11 (amd64) >>> (linux-amd64-ubuntu-2204-container) Copying repo >>> (linux-amd64-ubuntu-2204-container) Running make mage and prepareOnRemote >>> (linux-amd64-ubuntu-2204-container) Copying agent build elastic-agent-8.15.0-SNAPSHOT-linux-x86_64.tar.gz >>> (linux-amd64-ubuntu-2204-container) Copying agent build elastic-agent-8.15.0-SNAPSHOT-x86_64.rpm >>> (linux-amd64-ubuntu-2204-container) Copying agent build elastic-agent-8.15.0-SNAPSHOT-amd64.deb >>> (linux-amd64-ubuntu-2204-container) Waiting for stack to be ready... >>> (linux-amd64-ubuntu-2204-container) Using Stack with Kibana host https://50d5e2e27e074dd896ef8eae3ec2b882.us-west2.gcp.elastic-cloud.com:9243, credentials available under .integration-cache >>> (linux-amd64-ubuntu-2204-container) Running sudo tests... >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stderr): go: downloading github.com/rs/zerolog v1.27.0 >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stderr): go: downloading github.com/elastic/mock-es v0.0.0-20240605193845-b5546a703d6f >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stderr): go: downloading github.com/rcrowley/go-metrics v0.0.0-20201227073835-cf1acfcdf475 >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stderr): go: downloading github.com/gorilla/mux v1.8.0 >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stderr): go: downloading github.com/mileusna/useragent v1.3.4 >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stdout): >> go test: remote-linux-amd64-ubuntu-2204-container-sudo.integration Testing >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stdout): exec: gotestsum --no-color -f standard-quiet --junitfile build/TEST-go-remote-linux-amd64-ubuntu-2204-container-sudo.integration.xml --jsonfile build/TEST-go-remote-linux-amd64-ubuntu-2204-container-sudo.integration.out.json -- -tags integration -test.shuffle on -test.timeout 2h0m0s -test.run ^(TestContainerCMDWithAVeryLongStatePath)$ github.com/elastic/elastic-agent/testing/integration >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stdout): -test.shuffle 1718282120223208516 >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stdout): ok github.com/elastic/elastic-agent/testing/integration 69.085s >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stdout): DONE 5 tests in 140.793s >>> (linux-amd64-ubuntu-2204-container) Test output (sudo) (stdout): >> go test: remote-linux-amd64-ubuntu-2204-container-sudo.integration Test Passed >>> Testing completed (5 successful) >>> Console output written here: build/TEST-go-integration.out >>> Console JSON output written here: build/TEST-go-integration.out.json >>> JUnit XML written here: build/TEST-go-integration.xml >>> Diagnostic output (if present) here: build/diagnostics AGENT_KEEP_INSTALLED=true SNAPSHOT=true TEST_PLATFORMS="linux/amd64" mage 2.66s user 1.81s system 0% cpu 16:25.06 total ```

  2. List instances to confirm the integration tests framework knows about a single VM

    mage integration:listInstances
    Output

    ``` +----------------+------------------------------------------------------------------------------------------------------------+ | # | 0 | | Provisioner | ogc | | Name | ogc-linux-amd64-ubuntu-2204-container-63aa | | ID | linux-amd64-ubuntu-2204-container | | Instance ID | 1234567890123456789 | | IP | 42.42.42.42 | | Private Key | /home/auser/devel/elastic-agent/.integration-cache/id_rsa | | Public Key | /home/auser/devel/elastic-agent/.integration-cache/id_rsa.pub | | SSH connection | ssh -i /home/auser/devel/elastic-agent/.integration-cache/id_rsa ubuntu@42.42.42.42 | | GCP Link | https://console.cloud.google.com/compute/instancesDetail/zones/us-central1-a/instances/1234567890123456789 | +----------------+------------------------------------------------------------------------------------------------------------+ ```

  3. Run ogc ls and verify that 6 instances were created

    Output

    ``` ┏━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓ ┃ ID ┃ Name ┃ Created ┃ Status ┃ Labels ┃ Tags ┃ Connect… ┃ ┡━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩ │ 12345678… │ ogc-lin… │ an hour │ running │ divisio… │ agent-in… │ ssh -i │ │ │ │ ago │ │ │ │ .integr… │ │ │ │ │ │ │ │ ubuntu@… │ ├───────────┼──────────┼───────────┼─────────┼──────────┼───────────┼──────────┤ │ 12345678… │ ogc-lin… │ an hour │ running │ divisio… │ agent-in… │ ssh -i │ │ │ │ ago │ │ │ │ .integr… │ │ │ │ │ │ │ │ ubuntu@… │ ├───────────┼──────────┼───────────┼─────────┼──────────┼───────────┼──────────┤ │ 12345678… │ ogc-lin… │ an hour │ running │ divisio… │ agent-in… │ ssh -i │ │ │ │ ago │ │ │ │ .integr… │ │ │ │ │ │ │ │ ubuntu@… │ ├───────────┼──────────┼───────────┼─────────┼──────────┼───────────┼──────────┤ │ 12345678… │ ogc-lin… │ an hour │ running │ divisio… │ agent-in… │ ssh -i │ │ │ │ ago │ │ │ │ .integr… │ │ │ │ │ │ │ │ ubuntu@… │ ├───────────┼──────────┼───────────┼─────────┼──────────┼───────────┼──────────┤ │ 12345678… │ ogc-lin… │ an hour │ running │ divisio… │ agent-in… │ ssh -i │ │ │ │ ago │ │ │ │ .integr… │ │ │ │ │ │ │ │ ubuntu@… │ ├───────────┼──────────┼───────────┼─────────┼──────────┼───────────┼──────────┤ │ 12345678… │ ogc-lin… │ an hour │ running │ divisio… │ agent-in… │ ssh -i │ │ │ │ ago │ │ │ │ .integr… │ │ │ │ │ │ │ │ ubuntu@… │ └───────────┴──────────┴───────────┴─────────┴──────────┴───────────┴──────────┘ Node Count: 6 ```

I changed all IPs and IDs, the original output had a different for each VM.

Some instances are from different groups and they include ARM64 even though the original command selected only AMD64, here are some details about the VMs:

VM details

```sh ogc ls --as-json |jq '.[]| {"id": .id, "name": .instance_name, "state": .instance_state, "tags": .layout.tags}' ``` ```json { "id": 1, "name": "ogc-linux-amd64-ubuntu-2204-container-63aa", "state": "running", "tags": [ "agent-integration", "linux", "amd64", "ubuntu-22-04" ] } { "id": 2, "name": "ogc-linux-amd64-ubuntu-2204-fleet-airgapped-cedb", "state": "running", "tags": [ "agent-integration", "linux", "amd64", "ubuntu-22-04" ] } { "id": 3, "name": "ogc-linux-arm64-ubuntu-2204-default-8f9c", "state": "running", "tags": [ "agent-integration", "linux", "arm64", "ubuntu-22-04" ] } { "id": 4, "name": "ogc-linux-amd64-ubuntu-2204-fleet-f8f3", "state": "running", "tags": [ "agent-integration", "linux", "amd64", "ubuntu-22-04" ] } { "id": 5, "name": "ogc-linux-arm64-ubuntu-2204-container-cc75", "state": "running", "tags": [ "agent-integration", "linux", "arm64", "ubuntu-22-04" ] } { "id": 6, "name": "ogc-linux-amd64-ubuntu-2204-default-26e7", "state": "running", "tags": [ "agent-integration", "linux", "amd64", "ubuntu-22-04" ] } ```

The test has got the following define section:

    info := define.Require(t, define.Requirements{
        Stack: &define.Stack{},
        Local: false,
        Sudo:  true,
        OS: []define.OS{
            {Type: define.Linux},
        },
        Group: "container",
    })

The test TestContainerCMDWithAVeryLongStatePath I used as example is from a PR (https://github.com/elastic/elastic-agent/pull/4909) that is open at the time of writing, however the same behaviour is experienced with tests that are already on main.

elasticmachine commented 1 week ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

elasticmachine commented 1 week ago

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)