OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.19k stars 472 forks source link

Improve error handling in OneFlow scale-up operations #6545

Closed OpenNebulaSupport closed 2 months ago

OpenNebulaSupport commented 3 months ago

Description The OneFlow component doesn't handle correctly the case when a single VM deployment fails during the role scaling operation. Instead of reporting a failure, it reports SUCCESS leaving the VM body information empty inside the JSON Service body:

# Extract from the JSON Service Body
  ...
  "nodes": [ 
     {
        "deploy_id": 4,
        "vm_info": null
     }
  ]
  ...

This may result in unexpected behavior, since the VM isn't controlled by the service or any other component and its information remains empty in the JSON body of the service.

To Reproduce

  1. Create a OneFlow Service with the role scaling policies enabled. The following template was used to reproduce the case:

    {
        "name": "test-service",
        "deployment": "straight",
        "description": "test-service template for debug purposes",
        "roles": [
            {
            "name": "master",
            "cardinality": 1,
            "vm_template": 1,
            "vm_template_contents": "",
            "min_vms": 1,
            "max_vms": 1,
            "cooldown": 5,
            "elasticity_policies": [],
            "scheduled_policies": []
            },
            {
            "name": "worker",
            "cardinality": 2,
            "vm_template": 2,
            "parents": ["master"],
            "vm_template_contents": "",
            "min_vms": 2,
            "max_vms": 10,
            "cooldown": 60,
            "elasticity_policies": [
                {
                "type": "CHANGE",
                "adjust": 1,
                "expression": "TEST_ATTR > 100",
                "period_number": 1,
                "period": 60,
                "cooldown": 120
                }
            ],
            "scheduled_policies": []
            }
        ]
    }
  2. Once the Service is deployed, wait for the service to automatically scale (you can force this by creating an attribute on the VMs and change its value).
  3. To force the scaling operation to fail, once the Service is in RUNNING state, you can DISABLE all hosts, so that when the Service tries to scale, it's going to fail since there are no more free hosts left.
  4. At this point, the OneFlow service will add the VM to the Service body with the empty body.

Expected behavior The Service scaling operation is cancelled and the error is reported correctly.

Details

Additional context In some cases, the VM deployment works correctly during the scaling operation, but due to other errors or unexpected messages during deployment may cause the same behavior.

Progress Status