Azure / container-upstream

This project captures work in progress, and completed work for the Azure Core Container Upstream team
MIT License
53 stars 26 forks source link

Deployment failed in aks-engine-azure-disk-vmas #101

Closed mboersma closed 3 years ago

mboersma commented 3 years ago

Unfortunately, the only details included here are below. We do not have any logs from the actual VM, so we'll need to try to reproduce the failure and ssh into the machine to determine what failed during provisioning.

2021/05/15 05:59:40 main.go:327: Something went wrong: starting e2e cluster: error creating cluster: 
cannot deploy: cannot get the create deployment future response: Code="DeploymentFailed" 
Message="At least one resource deployment operation failed. Please list deployment operations 
for details. Please see https://aka.ms/DeployOperations for usage details." Details=
{
  "code":"Conflict",
  "message":{
    "status": "Failed",
    "error": {
      "code": "ResourceDeploymentFailure",
      "message": "The resource operation completed with terminal provisioning state 'Failed'.",
      "details": [
        {
          "code": "VMExtensionProvisioningError",
          "message": "VM has reported a failure when processing extension 'cse-master-0'. Error message: \"Enable failed: failed to execute command: command terminated with exit status=36\n[stdout]\nSat May 15 05:41:31 UTC 2021,k8s-master-87323863-0\n\n[stderr]\n\"\r\n\r\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot "
        }
      ]
    }
  }
}

https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-aks-engine-azure-disk-vmas/1393436578520502272 https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-aks-engine-azure-disk-vmas/1394218517963739136

devigned commented 3 years ago

@chewong, @CecileRobertMichon, @jackfrancis what is the purpose of running the aks-engine vmas disk test if we are also running the CAPZ test? Do we need this test?

jackfrancis commented 3 years ago

does capz use availability sets for vms? (is the availability set aspect of this test important?)

CecileRobertMichon commented 3 years ago

we actually already disabled the periodic job for azure file + aks-engine, in favor of the azure file + capz one (https://testgrid.k8s.io/provider-azure-1.21-signal#capz-azure-file).

This is the presubmit job, for which there is no capz equivalent yet.

chewong commented 3 years ago

Let's convert all aks-engine-based PR jobs to CAPZ-based now that we have decent test signal from the periodic jobs. What do you guys think?

devigned commented 3 years ago

I'm on it.

CecileRobertMichon commented 3 years ago

removed aks-engine job

should we do this for other jobs as well? do we need a separate issue to track it?