dell / dellemc-openmanage-ansible-modules

Dell OpenManage Ansible Modules
GNU General Public License v3.0
329 stars 162 forks source link

dellemc_idrac_storage_volume Fails to create RAID 1 #144

Closed janr7 closed 4 years ago

janr7 commented 4 years ago

Hello, Please help why this Raid 1 create fail.

- name: Create Raid boot volume
    dellemc_idrac_storage_volume:
    idrac_ip:               "{{ rmb_address }}"
    idrac_user:             "{{ rmb_username }}"
    idrac_password:         "{{ rmb_password }}"
    raid_reset_config:        "True"
    state:                     "create"
    controller_id:             "AHCI.Slot.2-1"
    raid_init_operation:       "Fast"
    volumes:
      - name:                  "VD_R1_1_a"
         volume_type:           "RAID 1"
         span_depth:            1
         span_length:           1
         stripe_size:           65536
         read_cache_policy:     "ReadAhead"
         drives:
               id: ["Disk.Direct.0-0:AHCI.Slot.2-1", "Disk.Direct.1-1:AHCI.Slot.2-1"]

FAILED! => { "changed": false, "invocation": { "module_args": { "capacity": null, "controller_id": "AHCI.Slot.2-1", "disk_cache_policy": "Default", "idrac_ip": "10.145.103.135", "idrac_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "idrac_port": 443, "idrac_user": "Laboradmin", "media_type": null, "number_dedicated_hot_spare": 0, "protocol": null, "raid_init_operation": "Fast", "raid_reset_config": "True", "read_cache_policy": "NoReadAhead", "span_depth": 1, "span_length": 1, "state": "create", "stripe_size": 65536, "volume_id": null, "volume_type": "RAID 0", "volumes": [ { "drives": { "id": [ "Disk.Direct.0-0:AHCI.Slot.2-1", "Disk.Direct.1-1:AHCI.Slot.2-1" ] }, "name": "VD_R1_1_a", "read_cache_policy": "ReadAhead", "span_depth": 1, "span_length": 1, "stripe_size": 65536, "volume_type": "RAID 1" } ], "write_cache_policy": "WriteThrough" } }, "msg": "Failed to perform storage operation" }

Thank you.

anupamaloke commented 4 years ago

@janr7, could you please share the drive inventory of the server? Also it would be good to check whether an import SCP job was created on the server? If yes, then could you please export the LC log and search the log using the import SCP JOB ID to see what was the issue?

janr7 commented 4 years ago

Hi anupamaloke

Thank you for looking at this. There were only an export job log - no import job. R740xd BIOS Version | 2.8.1 iDRAC Firmware Version | 4.22.00.00

Here the output of a view request:

{ "result": { "changed": false, "failed": false, "msg": "Successfully completed the view storage volume operation", "storage_status": { "Message": { "Controller": { "AHCI.Embedded.1-1": { "ControllerSensor": { "AHCI.Embedded.1-1": {} } }, "AHCI.Embedded.2-1": { "ControllerSensor": { "AHCI.Embedded.2-1": {} } }, "AHCI.Slot.2-1": { "ControllerSensor": { "AHCI.Slot.2-1": {} }, "PhysicalDisk": [ "Disk.Direct.0-0:AHCI.Slot.2-1", "Disk.Direct.1-1:AHCI.Slot.2-1" ], "VirtualDisk": { "Disk.Virtual.0:AHCI.Slot.2-1": { "PhysicalDisk": [ "Disk.Direct.0-0:AHCI.Slot.2-1", "Disk.Direct.1-1:AHCI.Slot.2-1" ] } } }, "NonRAID.Slot.6-1": { "ControllerSensor": { "NonRAID.Slot.6-1": {} }, "Enclosure": { "Enclosure.Internal.0-1:NonRAID.Slot.6-1": { "EnclosureSensor": { "Enclosure.Internal.0-1:NonRAID.Slot.6-1": {} }, "PhysicalDisk": [ "Disk.Bay.0:Enclosure.Internal.0-1:NonRAID.Slot.6-1" ] } } } }, "PCIeSSDExtender": { "PCIeExtender.Slot.3": { "PCIeSSDDisk": [ "Disk.Bay.20:Enclosure.Internal.0-1:PCIeExtender.Slot.3", "Disk.Bay.22:Enclosure.Internal.0-1:PCIeExtender.Slot.3", "Disk.Bay.23:Enclosure.Internal.0-1:PCIeExtender.Slot.3", "Disk.Bay.21:Enclosure.Internal.0-1:PCIeExtender.Slot.3" ] }, "PCIeExtender.Slot.4": { "PCIeSSDDisk": [ "Disk.Bay.19:Enclosure.Internal.0-1:PCIeExtender.Slot.4", "Disk.Bay.18:Enclosure.Internal.0-1:PCIeExtender.Slot.4" ] }, "PCIeExtender.Slot.8": {} } }, "Status": "Success" } } }

Further have I tried to create a RAID10 as task below. This comes back also with only a LC export job, play success but nothing created.

- name: Create Raid boot volume
  dellemc_idrac_storage_volume:
    idrac_ip:                  "{{ rmb_address }}"
    idrac_user:                "{{ rmb_username }}"
    idrac_password:            "{{ rmb_password }}"
    state:                     "create"
    raid_reset_config:         "True"
    controller_id:             "PCIeExtender.Slot.3"
    volumes:
      - name:                  "Virtual-Disk-r10"
        volume_type:           "RAID 10"
        disk_cache_policy:     "Default"
        write_cache_policy:    "WriteBack"
        read_cache_policy:     "ReadAhead"
        raid_init_operation:   "Fast"
        span_length: 2
        span_depth:  2
        number_dedicated_hot_spare: 0
        stripe_size: 262144
        drives:
          id: ["Disk.Bay.20:Enclosure.Internal.0-1:PCIeExtender.Slot.3",
               "Disk.Bay.21:Enclosure.Internal.0-1:PCIeExtender.Slot.3",
               "Disk.Bay.22:Enclosure.Internal.0-1:PCIeExtender.Slot.3",
               "Disk.Bay.23:Enclosure.Internal.0-1:PCIeExtender.Slot.3"]
janr7 commented 4 years ago

Closed by mistake.

anupamaloke commented 4 years ago

@janr7, for the first issue, where you were trying to configure RAID 1 on BOSS drives (Embedded AHCI controller), a virtual drive already existed. Regarding your next issue, it seems you are trying to create RAID 10 on NVMe drives. The current module will not work for software RAID. You need to use the software raid S140 for creating RAID volumes on NVMe drives. See here for more details on PERC S140. However, you can always configure RAID volumes on NVMe drives on one server using the S140 raid controller manually, export the SCP and then import the SCP on rest of your servers using idrac_server_config_profile.

janr7 commented 4 years ago

Hi anupamaloke

Thank you for the info in the NVMe drives.

The BOSS drive issue: The 'view' I attached above was from this morning after a re-install of the OS. Sorry should have done that. A new view is pasted below after the 'delete' task ran and another 'create' play ran. Also no Import LC JOB, only an export.

After delete Task a new View output.

{ "result": { "changed": false, "failed": false, "msg": "Successfully completed the view storage volume operation", "storage_status": { "Message": { "Controller": { "AHCI.Embedded.1-1": { "ControllerSensor": { "AHCI.Embedded.1-1": {} } }, "AHCI.Embedded.2-1": { "ControllerSensor": { "AHCI.Embedded.2-1": {} } }, "AHCI.Slot.2-1": { "ControllerSensor": { "AHCI.Slot.2-1": {} }, "PhysicalDisk": [ "Disk.Direct.0-0:AHCI.Slot.2-1", "Disk.Direct.1-1:AHCI.Slot.2-1" ] }, "NonRAID.Slot.6-1": { "ControllerSensor": { "NonRAID.Slot.6-1": {} }, "Enclosure": { "Enclosure.Internal.0-1:NonRAID.Slot.6-1": { "EnclosureSensor": { "Enclosure.Internal.0-1:NonRAID.Slot.6-1": {} }, "PhysicalDisk": [ "Disk.Bay.0:Enclosure.Internal.0-1:NonRAID.Slot.6-1" ] } } } }, "PCIeSSDExtender": { "PCIeExtender.Slot.3": { "PCIeSSDDisk": [ "Disk.Bay.20:Enclosure.Internal.0-1:PCIeExtender.Slot.3", "Disk.Bay.22:Enclosure.Internal.0-1:PCIeExtender.Slot.3", "Disk.Bay.23:Enclosure.Internal.0-1:PCIeExtender.Slot.3", "Disk.Bay.21:Enclosure.Internal.0-1:PCIeExtender.Slot.3" ] }, "PCIeExtender.Slot.4": { "PCIeSSDDisk": [ "Disk.Bay.19:Enclosure.Internal.0-1:PCIeExtender.Slot.4", "Disk.Bay.18:Enclosure.Internal.0-1:PCIeExtender.Slot.4" ] }, "PCIeExtender.Slot.8": {} } }, "Status": "Success" } } }

After the delete a new 'create'.

{ "changed": false, "invocation": { "module_args": { "capacity": null, "controller_id": "AHCI.Slot.2-1", "disk_cache_policy": "Default", "idrac_ip": "10.145.103.135", "idrac_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "idrac_port": 443, "idrac_user": "Laboradmin", "media_type": null, "number_dedicated_hot_spare": 0, "protocol": null, "raid_init_operation": "Fast", "raid_reset_config": "True", "read_cache_policy": "NoReadAhead", "span_depth": 1, "span_length": 1, "state": "create", "stripe_size": 65536, "volume_id": null, "volume_type": "RAID 0", "volumes": [ { "drives": { "id": [ "Disk.Direct.0-0:AHCI.Slot.2-1", "Disk.Direct.1-1:AHCI.Slot.2-1" ] }, "name": "VD_R1_1_a", "read_cache_policy": "ReadAhead", "span_depth": 1, "span_length": 1, "stripe_size": 65536, "volume_type": "RAID 1" } ], "write_cache_policy": "WriteThrough" } }, "msg": "Failed to perform storage operation" } PLAY RECAP *** localhost : ok=1 changed=0 unreachable=0 failed=1 skipped=0 rescued=0 ignored=0

Thanks so much.

janr7 commented 4 years ago

Hi anupamaloke

Running with debug on: ANSIBLE_DEBUG=1

The paste is from after the export LC Job.

20740 1598441981.06662: done communicating 20740 1598441981.06703: done with local.exec_command() 20740 1598441981.06774: _low_level_execute_command() done: rc=1, stdout=WARN: Changing isFolder to false, as it is not directory msg_id=SYS043 Severity=Informational could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available' could not convert string to float: 'Not Available'

{"msg": "Failed to perform storage operation", "failed": true, "invocation": {"module_args": {"idrac_ip": "10.145.103.135", "idrac_user": "Laboradmin", "idrac_password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", "raid_reset_config": "True", "state": "create", "controller_id": "AHCI.Slot.2-1", "raid_init_operation": "Fast", "volumes": [{"name": "VD_R1_1_a", "volume_type": "RAID 1", "span_depth": 1, "span_length": 1, "stripe_size": 65536, "read_cache_policy": "ReadAhead", "drives": {"id": ["Disk.Direct.0-0:AHCI.Slot.2-1", "Disk.Direct.1-1:AHCI.Slot.2-1"]}}], "idrac_port": 443, "span_depth": 1, "span_length": 1, "number_dedicated_hot_spare": 0, "volume_type": "RAID 0", "disk_cache_policy": "Default", "write_cache_policy": "WriteThrough", "read_cache_policy": "NoReadAhead", "stripe_size": 65536, "volume_id": null, "capacity": null, "media_type": null, "protocol": null}}} , stderr=ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:cannot compare with IntField ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:Invalid attribute FreeSize ERROR:omsdk.typemgr.ArrayType:cannot compare with IntField 20740 1598441981.07405: done with _execute_module (dellemc_idrac_storage_volume, {'idrac_ip': '10.145.103.135', 'idrac_user': 'Laboradmin', 'idrac_password': 'E267D465E8BE', 'raid_reset_config': 'True', 'state': 'create', 'controller_id': 'AHCI.Slot.2-1', 'raid_init_operation': 'Fast', 'volumes': [{'name': 'VD_R1_1_a', 'volume_type': 'RAID 1', 'span_depth': 1, 'span_length': 1, 'stripe_size': 65536, 'read_cache_policy': 'ReadAhead', 'drives': {'id': ['Disk.Direct.0-0:AHCI.Slot.2-1', 'Disk.Direct.1-1:AHCI.Slot.2-1']}}], '_ansible_check_mode': False, '_ansible_no_log': False, '_ansible_debug': True, '_ansible_diff': False, '_ansible_verbosity': 3, '_ansible_version': '2.9.11', '_ansible_module_name': 'dellemc_idrac_storage_volume', '_ansible_syslog_facility': 'LOG_USER', '_ansible_selinux_special_fs': ['fuse', 'nfs', 'vboxsf', 'ramfs', '9p', 'vfat'], '_ansible_string_conversion_action': 'warn', '_ansible_socket': None, '_ansible_shell_executable': '/bin/sh', '_ansible_keep_remote_files': False, '_ansible_tmpdir': '/root/.ansible/tmp/ansible-tmp-1598441905.3445141-20740-90634008004432/', '_ansible_remote_tmp': '~/.ansible/tmp'}) 20740 1598441981.07504: _low_level_execute_command(): starting 20740 1598441981.07623: _low_level_execute_command(): executing: /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1598441905.3445141-20740-90634008004432/ > /dev/null 2>&1 && sleep 0' 20740 1598441981.07672: in local.exec_command() <127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1598441905.3445141-20740-90634008004432/ > /dev/null 2>&1 && sleep 0' 20740 1598441981.07749: opening command with Popen() 20740 1598441981.14879: done running command with Popen() 20740 1598441981.15031: getting output with communicate() 20740 1598441981.18113: done communicating 20740 1598441981.18145: done with local.exec_command() 20740 1598441981.18217: _low_level_execute_command() done: rc=0, stdout=, stderr= 20740 1598441981.18290: handler run complete 20740 1598441981.18842: attempt loop complete, returning result 20740 1598441981.18903: _execute() done 20740 1598441981.18941: dumping result to json 20740 1598441981.19005: done dumping result, returning 20740 1598441981.19097: done running TaskExecutor() for localhost/TASK: Create Raid boot volume [00163e91-274a-a6fb-0ecf-000000000008] 20740 1598441981.19230: sending task result for task 00163e91-274a-a6fb-0ecf-000000000008 20740 1598441981.19479: done sending task result for task 00163e91-274a-a6fb-0ecf-000000000008 20740 1598441981.19821: WORKER PROCESS EXITING 20733 1598441981.20260: marking localhost as failed 20733 1598441981.20351: marking host localhost failed, current state: HOST STATE: block=2, task=2, rescue=0, always=0, run_state=ITERATING_TASKS, fail_state=FAILED_NONE, pending_setup=False, tasks child state? (None), rescue child state? (None), always child state? (None), did rescue? False, did start at task? False 20733 1598441981.20403: ^ failed state is now: HOST STATE: block=2, task=2, rescue=0, always=0, run_state=ITERATING_COMPLETE, fail_state=FAILED_TASKS, pending_setup=False, tasks child state? (None), rescue child state? (None), always child state? (None), did rescue? False, did start at task? False 20733 1598441981.20446: getting the next task for host localhost 20733 1598441981.20481: host localhost is done iterating, returning fatal: [localhost]: FAILED! => { "changed": false, "invocation": { "module_args": { "capacity": null,

Thanks again.

anupamaloke commented 4 years ago

@janr7, could you please raise a support ticket with us for the dellemc_idrac_storage_volume so that the support team can get engaged and collect as much information as needed for troubleshooting this issue?

anupamaloke commented 4 years ago

@janr7, could you also please try using the following module redfish_storage_volume to create the VD? I am just trying to narrow down the issue. If the following module works, then we will know for sure that the issue is with dellemc_idrac_storage_volume.

- name: Create a volume with supported options
  redfish_storage_volume:
    baseuri: "{{ rmb_address }}"
    username: "{{ rmb_username }}"
    password: "{{ rmb_password }}"
    state: "present"
    volume_type: "Mirrored"
    name: "VD_R1_1_a"
    controller_id: "AHCI.Slot.2-1"
    drives:
      - Disk.Direct.0-0:AHCI.Slot.2-1
      - Disk.Direct.1-1:AHCI.Slot.2-1
    optimum_io_size_bytes: 65536
  register: result

- name: Track the job to completion
  idrac_lifecycle_controller_job_status_info:
    idrac_ip: "{{ rmb_address }}"
    idrac_user: "{{ rmb_username }}"
    idrac_password: "{{ rmb_password }}"
    job_id: "{{ result.task.uri | basename }}"
  register: job_result
  until: job_result.msg.JobStatus == 'Completed' or job_result.msg.JobStatus == 'Pending'
  retries: 50
  delay: 10

- name: Restart the server to kick-start the storage volume creation
  dellemc_change_power_state:
    idrac_ip: "{{ rmb_address }}"
    idrac_user: "{{ rmb_username }}"
    idrac_password: "{{ rmb_password }}"
    reset_type: "ForceRestart"
  when: job_result.msg.JobStatus == 'Pending'
  register: restart_result

- name: Track the job to completion
  idrac_lifecycle_controller_job_status_info:
    idrac_ip: "{{ rmb_address }}"
    idrac_user: "{{ rmb_username }}"
    idrac_password: "{{ rmb_password }}"
    job_id: "{{ result.task.uri | basename }}"
  register: job_result
  until: job_result.msg.JobStatus == 'Completed' or job_result.msg.JobStatus == 'Failed'
  failed_when: job_result.msg.JobStatus == 'Failed'
  changed_when: job_result.msg.JobStatus == 'Completed'
  retries: 50
  delay: 10
 when: restart_result.changed
janr7 commented 4 years ago

Hi Anupam

Thank you, we will try this, thanks so much.

Some environment information.

ansible --version ansible 2.9.11 config file = /etc/ansible/ansible.cfg configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /global/instserv/ansible/ansible_venv/lib/python3.7/site-packages/ansible executable location = /global/instserv/ansible/ansible_venv/bin/ansible python version = 3.7.8 (default, Jul 27 2020, 10:06:38) [GCC 4.8.5]

-- python --version Python 3.7.8

-- pip freeze ansible==2.9.11 certifi==2020.6.20 cffi==1.14.1 chardet==3.0.4 cryptography==3.0 enum34==1.1.10 future==0.18.2 idna==2.10 ipaddress==1.0.23 Jinja2==2.11.2 MarkupSafe==1.1.1 omsdk==1.2.445 ply==3.11 pyasn1==0.4.8 pycparser==2.20 pycryptodomex==3.9.8 pysmi==0.3.4 pysnmp==4.4.12 pysnmp-mibs==0.1.6 python-version==0.0.2 pyvmomi==7.0 PyYAML==5.3.1 requests==2.24.0 six==1.15.0 urllib3==1.25.10

-- Ansible Node OS: NAME="SLES" VERSION="12-SP4" PRETTY_NAME="SUSE Linux Enterprise Server 12 SP4"

--

Client server where RAID is to be configured. R740xd BIOS Version | 2.8.1 iDRAC Firmware Version | 4.22.00.00

Regards.

janr7 commented 4 years ago

Hi Anupam

The JOB schedules to create the virtual disk with correct Name: VD_R1_1_a but never runs. Did a IDRAC reset and cancelled JOB and re-scheduled.

Storage Configuration -> Virtual Disk Conf - all greyed out. JOB status 0% Still this. RAC0659: Unable to perform the storage configuration operation(s) on BOSS-S1 because a job is currently pending or is running on the device. Wait for the job to completely run, or delete the job, and then retry the operation. To view the status of the scheduled jobs, go to the Job Queue page of iDRAC.

anupamaloke commented 4 years ago

@janr7, the redfish_storage_volume module will create a Virtual Drive creation job. Once that job is created, you will have to restart your server (not iDRAC) to kick start the VD creation. If you take a look at the sample playbook that I posted, it has four tasks:

  1. Submit a VD creation job. This returns a JOB ID on success.
  2. Check if job is running or pending. If running then track the job to completion or failure.
  3. If the job is pending, then restart the server.
  4. On server restart, the Job should be running. If yes, then track it to completion or failure.
janr7 commented 4 years ago

Hi Anupam

I did notice. :) However idrac_lifecycle_controller_job_status_info seems to be part of 2.1.1, this will be loaded : 28 Aug. Raid was successfully created after the reboot. This is such good news and appreciated. Thank you so much.

janr7 commented 4 years ago

Hi anupamaloke

After 2.1.1 had been installed, dellemc_idrac_storage_volume does not give the failed message anymore. It gives a new message but will open a new issue. Thank you.