dell / dellemc-openmanage-ansible-modules

Dell OpenManage Ansible Modules
GNU General Public License v3.0
340 stars 164 forks source link

[BUG]: Setting power_state to "shutdown" with module dellemc.openmanage.ome_powerstate does not perform graceful shutdown #517

Closed sbeyermann closed 1 year ago

sbeyermann commented 1 year ago

Bug Description

I use a very simple task to shut down a server using the dellemc.openmanage.ome_powerstate module:

tasks:
    - name: Gracefully shutdown the system
      dellemc.openmanage.ome_powerstate:
        hostname: myome.company.com
        username: admin
        password: somepassword
        validate_certs: false
        device_id: 13663
        power_state: "shutdown"
      delegate_to: localhost

From the documentation I would assume that setting power_state to shutdown would initiate a graceful shutdown of the operating system. However in our environment that does not seem to be the case.

Running the above task translates to the following job in Dell OpenManage Enterprise

Running
Verifying if the device Service Tag is valid.
The device Service Tag is valid.
Retrieving system power state.
System power state is ON. Proceeding with power state change to POWER_OFF_SOFT.
Executing power state change action.
Completed 

As you can see OME performs a POWER_OFF_SOFT on the device, which then translates to the following in the iDRAC lifecycle protocol (newer entries first)

2023-08-03 10:44:17 | SYS1001 | System is turning off.
2023-08-03 10:44:16 | SYS1003 | System CPU Resetting.
2023-08-03 10:44:16 | SYS1005 | The server power action is initiated because the management controller initiated a power-down 
2023-08-03 10:44:12 | RAC0704 | Requested system powerdown.

For me this looks like a simple turning off of the system and no clean and graceful shutdown.

On the operating system side the above behavior leads to the "System unexpectedly shutdown (event id 600)" events on Microsoft Windows Operating Systems and to ext4 file system checks on Linux operating systems.

Component or Module Name

dellemc.openmanage.ome_powerstate

Ansible Version

Ansible 2.15.1

Python Version

3.11.2

iDRAC/OME/OME-M version

Operating System

Debian 11 (Bullseye) Windows Server 2022

Playbook Used

tasks:
    - name: Gracefully shutdown the system
      dellemc.openmanage.ome_powerstate:
        hostname: myome.company.com
        username: admin
        password: somepassword
        validate_certs: false
        device_id: 13663
        power_state: "shutdown"
      delegate_to: localhost

Logs

TASK [Gracefully shutdown the system] ****************************************************************************************************************************************************************************************************
task path: /scripts/ansible/pod/playbook-tmp.yml:59
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: root
<localhost> EXEC /bin/sh -c 'echo ~root && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir "` echo /root/.ansible/tmp/ansible-tmp-1691055375.2592177-1415-231192109792375 `" && echo ansible-tmp-1691055375.2592177-1415-231192109792375="` echo /root/.ansible/tmp/ansible-tmp-1691055375.2592177-1415-231192109792375 `" ) && sleep 0'
Using module file /root/.ansible/collections/ansible_collections/dellemc/openmanage/plugins/modules/ome_powerstate.py
<localhost> PUT /root/.ansible/tmp/ansible-local-1379kw7hbrg7/tmp1uxut8yw TO /root/.ansible/tmp/ansible-tmp-1691055375.2592177-1415-231192109792375/AnsiballZ_ome_powerstate.py
<localhost> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1691055375.2592177-1415-231192109792375/ /root/.ansible/tmp/ansible-tmp-1691055375.2592177-1415-231192109792375/AnsiballZ_ome_powerstate.py && sleep 0'
<localhost> EXEC /bin/sh -c '/opt/ansible_venv/bin/python3 /root/.ansible/tmp/ansible-tmp-1691055375.2592177-1415-231192109792375/AnsiballZ_ome_powerstate.py && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1691055375.2592177-1415-231192109792375/ > /dev/null 2>&1 && sleep 0'
changed: [somesytem -> localhost] => changed=true
  invocation:
    module_args:
      ca_path: /usr/local/share/ca-certificates/mycompany.crt
      device_id: 13663
      device_service_tag: null
      hostname: myome.company.com
      password: VALUE_SPECIFIED_IN_NO_LOG_PARAMETER
      port: 443
      power_state: shutdown
      timeout: 30
      username: admin@company.com
      validate_certs: false
  job_status:
    Builtin: false
    CreatedBy: admin@company.com
    Editable: true
    EndTime: null
    Id: 17340
    IdOwner: null
    IdUserOwner: 10687
    JobDescription: DeviceAction_Task
    JobName: DeviceAction_Task_PowerState
    JobStatus:
      Id: 2080
      Name: New
    JobType:
      Id: 3
      Internal: false
      IsShareUsageActive: false
      Name: DeviceAction_Task
    LastRun: null
    LastRunStatus:
      Id: 2200
      Name: NotRun
    NextRun: null
    Params:
    - JobId: 17340
      Key: operationName
      Value: POWER_CONTROL
    - JobId: 17340
      Key: powerState
      Value: '8'
    Schedule: startnow
    StartTime: null
    State: Enabled
    Targets:
    - Data: null
      Id: 13663
      JobId: 17340
      TargetType:
        Id: 1000
        Name: DEVICE
    UpdatedBy: null
    UserGenerated: true
    Visible: true
  msg: Power State operation job submitted successfully.

Steps to Reproduce

This is no intermittent issue. It happens every time I execute the above playbook/task and it happens on both operating systems (Windows Server 2022 and Debian 11) that I tested.

Expected Behavior

I would expect the operating system to shutdown graceful (stopping all services, unmounting filesystems, etc.) and then the server to turn off.

Actual Behavior

The server just get powered off, without a graceful shutdown of the running operating system (like someone pressing the power button for three seconds).

Screenshots

No response

Additional Information

No response

anupamaloke commented 1 year ago

@sbeyermann, sorry for a pretty late response. In order to perform a "Power Off (Grafeful)" operation on a server, you will have to set the power_state to off instead of shutdown.

In ome_powerstate.py file, I see the following mapping of the power_state values to the internal codes that OME uses for device power actions. The "Power Off (Graceful)" device action maps 12 internally, so you will have to set power_state: "off" in your playbook.

VALID_OPERATION = {"on": 2, "off": 12, "coldboot": 5, "warmboot": 10, "shutdown": 8}
sbeyermann commented 1 year ago

@anupamaloke, thank you for your response. It took a while, but now I managed to test your explanation in my playbook. Indeed, setting the power_state to off gracefully shuts down the server.

It would be really great if you could add the expected action to the dellemc.openmanage.ome_powerstate module documentation.

Thank you for your help!