OpenNebula / one

The open source Cloud & Edge Computing Platform bringing real freedom to your Enterprise Cloud 🚀
http://opennebula.io
Apache License 2.0
1.23k stars 478 forks source link

oneflow recover can falsely recover services in FAILED_DEPLOYING #6396

Closed dann1 closed 4 months ago

dann1 commented 10 months ago

Description If a flow template is instantiated and reaches FAILED_DEPLOY, a subsequent recover operation could set the flow service to RUNNING even though it could have no VMs at all backing it.

To Reproduce

root@provisionengine-test-env:~# oneflow-template instantiate FAILED_DEPLOY
ID: 1059
root@provisionengine-test-env:~# oneflow list
  ID USER     GROUP    NAME                                                                                                                             STARTTIME STAT
1059 oneadmin oneadmin FAILED_DEPLOY                                                                                                               11/13 17:36:53 FAILED_DEPLOYING
root@provisionengine-test-env:~# oneflow show 1059
SERVICE 1059 INFORMATION
ID                  : 1059
NAME                : FAILED_DEPLOY
USER                : oneadmin
GROUP               : oneadmin
STRATEGY            : straight
SERVICE STATE       : FAILED_DEPLOYING
START TIME          : 11/13 17:36:53

PERMISSIONS
OWNER               : um-
GROUP               : ---
OTHER               : ---

ROLE FAAS
ROLE STATE          : FAILED_DEPLOYING
VM TEMPLATE         : 7
CARDINALITY         : 1
SHUTDOWN            : terminate-hard

NODES INFORMATION
 VM_ID NAME                     USER            GROUP

LOG MESSAGES
11/13/23 17:36 [I] New state: DEPLOYING_NETS
11/13/23 17:36 [E] Role FAAS : Instantiate failed for template 7; [one.template.instantiate] Error allocating a new virtual machine template. Cannot get IP/MAC lease from virtual network 1.
11/13/23 17:36 [I] New state: FAILED_DEPLOYING
root@provisionengine-test-env:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME
root@provisionengine-test-env:~# oneflow recover 1059
root@provisionengine-test-env:~# oneflow list
  ID USER     GROUP    NAME                                                                                                                                     STARTTIME STAT
1059 oneadmin oneadmin FAILED_DEPLOY                                                                                                                       11/13 17:36:53 RUNNING
root@provisionengine-test-env:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME
root@provisionengine-test-env:~# oneflow show 1059
SERVICE 1059 INFORMATION
ID                  : 1059
NAME                : FAILED_DEPLOY
USER                : oneadmin
GROUP               : oneadmin
STRATEGY            : straight
SERVICE STATE       : RUNNING
START TIME          : 11/13 17:36:53

PERMISSIONS
OWNER               : um-
GROUP               : ---
OTHER               : ---

ROLE FAAS
ROLE STATE          : RUNNING
VM TEMPLATE         : 7
CARDINALITY         : 1
SHUTDOWN            : terminate-hard

NODES INFORMATION
 VM_ID NAME                     USER            GROUP

LOG MESSAGES
11/13/23 17:36 [I] New state: DEPLOYING_NETS
11/13/23 17:36 [E] Role FAAS : Instantiate failed for template 7; [one.template.instantiate] Error allocating a new virtual machine template. Cannot get IP/MAC lease from virtual network 1.
11/13/23 17:36 [I] New state: FAILED_DEPLOYING
11/13/23 17:37 [E] Role FAAS : Instantiate failed for template 7; [one.template.instantiate] Error allocating a new virtual machine template. Cannot get IP/MAC lease from virtual network 1.
11/13/23 17:37 [I] New state: RUNNING

Expected behavior When issuing the recover the flow should remain in a failure state as the conditions of the failures didn't change at all. Even the cardinality is set to 1 when there are no VMs backing the role.

Additional context There might also be a problem with the core as it is possible to create a virtual network with a size 0 address range

root@provisionengine-test-env:~# onevnet show 1
VIRTUAL NETWORK 1 INFORMATION
ID                       : 1
NAME                     : no_leases
USER                     : oneadmin
GROUP                    : oneadmin
LOCK                     : None
CLUSTERS                 : 0
BRIDGE                   : onebr1
STATE                    : READY
VN_MAD                   : bridge
AUTOMATIC VLAN ID        : NO
AUTOMATIC OUTER VLAN ID  : NO
USED LEASES              : 0

PERMISSIONS
OWNER                    : um-
GROUP                    : ---
OTHER                    : ---

VIRTUAL NETWORK TEMPLATE
BRIDGE="onebr1"
BRIDGE_TYPE="linux"
OUTER_VLAN_ID=""
PHYDEV=""
SECURITY_GROUPS="0"
VLAN_ID=""
VN_MAD="bridge"

ADDRESS RANGE POOL
AR 0
SIZE           : 0
LEASES         : 0

RANGE                                   FIRST                               LAST
MAC                         02:00:b9:18:c9:66                  02:00:b9:18:c9:65

LEASES
AR  OWNER        MAC    IP PORT_FORWARD   IP6

VIRTUAL ROUTERS

VIRTUAL MACHINES
UPDATED        :
OUTDATED       :
ERROR          :

Progress Status

dann1 commented 10 months ago

Another example

root@opennebula-frontend:~# oneflow show 576
SERVICE 576 INFORMATION
ID                  : 576
NAME                : Function
USER                : oneadmin
GROUP               : oneadmin
STRATEGY            : straight
SERVICE STATE       : FAILED_DEPLOYING
START TIME          : 11/14 02:18:40

PERMISSIONS
OWNER               : um-
GROUP               : ---
OTHER               : ---

ROLE FAAS
ROLE STATE          : FAILED_DEPLOYING
VM TEMPLATE         : 15
CARDINALITY         : 1

NODES INFORMATION
 VM_ID NAME                     USER            GROUP

LOG MESSAGES
11/14/23 02:18 [I] New state: DEPLOYING_NETS
11/14/23 02:18 [E] Role FAAS : Instantiate failed for template 15; [one.template.instantiate] Error allocating a new virtual machine template. User 0 does not own a network with name: github_actions_no_lease . Set NETWORK_UNAME or NETWORK_UID of owner in NIC.
11/14/23 02:18 [I] New state: FAILED_DEPLOYING
root@opennebula-frontend:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME
root@opennebula-frontend:~# oneflow recover 576
root@opennebula-frontend:~# oneflow list
  ID USER     GROUP    NAME                                                                                                                                     STARTTIME STAT
 576 oneadmin oneadmin Function                                                                                                                            11/14 02:18:40 RUNNING
root@opennebula-frontend:~# onevm list
  ID USER     GROUP    NAME                                                                        STAT  CPU     MEM HOST                                                     TIME