elastic / elastic-agent

Elastic Agent - single, unified way to add monitoring for logs, metrics, and other types of data to a host.
Other
112 stars 126 forks source link

[Integration Framework] Serverless instances fail to be removed #3808

Open belimawr opened 7 months ago

belimawr commented 7 months ago

mage integration:clean is failing to remove serverless projects, it might be handling it as stateful deployments when removing.

Steps to reproduce

  1. Run stateful and serverless tests
  2. Ensure the serverless project and the stateful deployment were created and are in the state file
  3. Run mage integration:clean, you will get the following error
    % mage integration:clean
    --- Clean mage artifacts
    >>>> Using ogc instance provisioner
    >>>> Using stateful stack provisioner
    >>> Destroying cloud stack 8.12.0-SNAPSHOT [stack_id: 8120-SNAPSHOT, deployment_id: 123456789]
    >>> Bring down instances through ogc
    >>> Destroying cloud stack 8.12.0-SNAPSHOT [stack_id: 8120-SNAPSHOT, deployment_id: abcdefg]
    Error: error running clean: got unexpected response code [403] from deployment shutdown API: {
        "errors": [
            {
                "message": "To access the resource [u:/deployments/123456789], the user must have the required authorization.",
                "code": "root.permission_denied"
            }
        ]
    }

The ID in the error message belongs to a stateless project, however the error message is coming from the ShutdownDeployment method responsible for deleting stateful deployments. https://github.com/elastic/elastic-agent/blob/a14c51f043070c1fbdbd9c5c83307c5f34216b97/pkg/testing/ess/deployment.go#L186-L204 State file before running mage integration:clean

Details

```yaml instances: - instance: id: linux-amd64-ubuntu-2204 name: ogc-linux-amd64-ubuntu-2204-8ea1 provisioner: ogc ip: 127.0.0.1 username: ubuntu remote_path: /home/ubuntu/agent internal: instance_id: "424242424242424242" prepared: true stacks: - id: 8120-SNAPSHOT provisioner: stateful version: 8.12.0-SNAPSHOT ready: true elasticsearch: https://foo.elastic-cloud.com kibana: https://foo.elastic-cloud.com:9243 username: elastic password: internal: deployment_id: abcdefg - id: 8120-SNAPSHOT provisioner: serverless version: 8.12.0-SNAPSHOT ready: true elasticsearch: https://foo.elastic.cloud kibana: https://foo.elastic.cloud username: elastic password: internal: deployment_id: 123456789 deployment_type: observability ```

elasticmachine commented 7 months ago

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

cmacknz commented 7 months ago

Assuming you are doing this locally it seems like mage integration:clean is trying to do the right thing, but your user doesn't have permission to delete the deployment.

@pazone how do we make sure each engineer has the correct permissions by default?

belimawr commented 7 months ago

Based on the error message, the error is coming from this method: https://github.com/elastic/elastic-agent/blob/a14c51f043070c1fbdbd9c5c83307c5f34216b97/pkg/testing/ess/deployment.go#L186-L204 that is responsible for deleting stateful deployments, however the ID in the message is from a serverless project.

It looks like to me, it is just trying to delete all deployments in the state.yml file as they are stateful because I didn't specify the STACK_PROVISIONER=serverless.

I should have been more clear in the description of the issue, even though I redacted the sensitive information, I kept the IDs consistent across the whole description. I'll edit the description adding this information.