HewlettPackard / terraform-provider-oneview

Automates the provisioning of physical infrastructure from a private cloud using templates from HPE OneView with Terraform
Apache License 2.0
49 stars 30 forks source link

server profile changes that aren't correct #487

Closed rismoney closed 1 year ago

rismoney commented 2 years ago

This is still happening randomly on TF plans and applies: Using TF 0.13.6 with hewlettpackard/oneview v6.3.1-13 A rerun of the the plan/tf and this doesn't happen. I am at a loss as to how it is not reliably getting values. Somehow it is looking up the wrong hardware name, I am not sure if there is a guid lookup that is failing or something else that is not right. This directory in particular has no Enclosure Group 5 references at all.

This does not happen every time. It is maybe 25% of TF runs.

[2022-06-22T17:41:48.465Z]   # module.EG06F01B11.oneview_server_profile.default will be updated in-place
[2022-06-22T17:41:48.465Z]   ~ resource "oneview_server_profile" "default" {
[2022-06-22T17:41:48.465Z]         affinity                      = "Bay"
[2022-06-22T17:41:48.465Z]         associated_server             = "MXQ84101YD"
[2022-06-22T17:41:48.465Z]         category                      = "server-profiles"
[2022-06-22T17:41:48.465Z]         created                       = "2021-09-10T13:28:38.206Z"
[2022-06-22T17:41:48.465Z]       ~ enclosure_group               = "EG_05" -> "EG_06"
[2022-06-22T17:41:48.465Z]         enclosure_uri                 = "/rest/enclosures/797740MXQ809065W"
[2022-06-22T17:41:48.465Z]         etag                          = "1632776077668/88"
[2022-06-22T17:41:48.465Z]       ~ hardware_name                 = "EG05_FRAME02, bay 10" -> "EG06_FRAME01, bay 11"
[2022-06-22T17:41:48.465Z]         hardware_uri                  = "/rest/server-hardware/39313738-3034-584D-5138-343130315944"
[2022-06-22T17:41:48.465Z]         hide_unused_flex_nics         = true
[2022-06-22T17:41:48.465Z]         id                            = "SP_EG06_FRAME01_B11"
[2022-06-22T17:41:48.465Z]         ilo_ip                        = "172.20.170.81"
[2022-06-22T17:41:48.465Z]         iscsi_initiator_name          = "iqn.2015-02.com.hpe:oneview-03d63f85-2306-499a-8229-4cd941476802"
[2022-06-22T17:41:48.465Z]         iscsi_initiator_name_type     = "AutoGenerated"
[2022-06-22T17:41:48.465Z]         mac_type                      = "Physical"
[2022-06-22T17:41:48.465Z]         modified                      = "2022-06-18T06:33:17.991Z"
[2022-06-22T17:41:48.465Z]       ~ name                          = "SP_EG05_FRAME02_B10" -> "SP_EG06_FRAME01_B11"
[2022-06-22T17:41:48.465Z]         profile_uuid                  = "03d63f85-2306-499a-8229-4cd941476802"
[2022-06-22T17:41:48.466Z]         refresh_state                 = "NotRefreshing"
[2022-06-22T17:41:48.466Z]         scopes_uri                    = "/rest/scopes/resources/rest/server-profiles/03d63f85-2306-499a-8229-4cd941476802"
[2022-06-22T17:41:48.466Z]         serial_number                 = "***CENSORED***"
[2022-06-22T17:41:48.466Z]         serial_number_type            = "Physical"
[2022-06-22T17:41:48.466Z]         server_hardware_reapply_state = "NotApplying"
[2022-06-22T17:41:48.466Z]         server_hardware_type          = "SY 480 Gen10 1"
[2022-06-22T17:41:48.466Z]         server_hardware_type_uri      = "/rest/server-hardware-types/2A4ADEE7-4556-4B0A-A7D7-D9B8BA065B45"
[2022-06-22T17:41:48.466Z]         state                         = "Normal"
[2022-06-22T17:41:48.466Z]         status                        = "OK"
[2022-06-22T17:41:48.466Z]         task_uri                      = "/rest/tasks/58cb2f5c-c7cd-4319-9667-5502f50624de"
[2022-06-22T17:41:48.466Z]         template                      = "SPT_EG06_TradingDesktops_Gen10_1"
[2022-06-22T17:41:48.466Z]         template_compliance           = "Compliant"
[2022-06-22T17:41:48.466Z]         type                          = "ServerProfileV12"
[2022-06-22T17:41:48.466Z]         update_type                   = "put"
[2022-06-22T17:41:48.466Z]         uri                           = "/rest/server-profiles/03d63f85-2306-499a-8229-4cd941476802"
[2022-06-22T17:41:48.466Z]         uuid                          = "39313738-3034-584D-5138-343130315944"
[2022-06-22T17:41:48.466Z]         wwn_type                      = "Physical"
[2022-06-22T17:41:48.466Z] 
[2022-06-22T17:41:48.466Z]         bios_option {
[2022-06-22T17:41:48.466Z]             consistency_state = "Unknown"
[2022-06-22T17:41:48.466Z]             manage_bios       = true
[2022-06-22T17:41:48.466Z]             reapply_state     = "NotApplying"
[2022-06-22T17:41:48.466Z] 
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "CollabPowerControl"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "EnergyEfficientTurbo"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "EnergyPerfBias"
[2022-06-22T17:41:48.466Z]                 value = "MaxPerf"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "IntelProcVtd"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "IntelUpiPowerManagement"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "InternalSDCardSlot"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "MemPatrolScrubbing"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "MemRefreshRate"
[2022-06-22T17:41:48.466Z]                 value = "Refreshx1"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "MinProcIdlePkgState"
[2022-06-22T17:41:48.466Z]                 value = "NoState"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "MinProcIdlePower"
[2022-06-22T17:41:48.466Z]                 value = "NoCStates"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "NumaGroupSizeOpt"
[2022-06-22T17:41:48.466Z]                 value = "Clustered"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "PowerRegulator"
[2022-06-22T17:41:48.466Z]                 value = "StaticHighPerf"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "ProcHyperthreading"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "ProcTurbo"
[2022-06-22T17:41:48.466Z]                 value = "Enabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "ProcVirtualization"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "ProcX2Apic"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "Sriov"
[2022-06-22T17:41:48.466Z]                 value = "Disabled"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "TimeFormat"
[2022-06-22T17:41:48.466Z]                 value = "Utc"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "TimeZone"
[2022-06-22T17:41:48.466Z]                 value = "UtcM5"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "UncoreFreqScaling"
[2022-06-22T17:41:48.466Z]                 value = "Maximum"
[2022-06-22T17:41:48.466Z]             }
[2022-06-22T17:41:48.466Z]             overridden_settings {
[2022-06-22T17:41:48.466Z]                 id    = "UsbControl"
[2022-06-22T17:41:48.466Z]                 value = "InternalUsbDisabled"
[2022-06-22T17:41:48.467Z]             }
[2022-06-22T17:41:48.467Z]             overridden_settings {
[2022-06-22T17:41:48.467Z]                 id    = "WorkloadProfile"
[2022-06-22T17:41:48.467Z]                 value = "Custom"
[2022-06-22T17:41:48.467Z]             }
[2022-06-22T17:41:48.467Z]         }
[2022-06-22T17:41:48.467Z] 
[2022-06-22T17:41:48.467Z]         boot {
[2022-06-22T17:41:48.467Z]             boot_order  = [
[2022-06-22T17:41:48.467Z]                 "HardDisk",
[2022-06-22T17:41:48.467Z]             ]
[2022-06-22T17:41:48.467Z]             manage_boot = true
[2022-06-22T17:41:48.467Z]         }
[2022-06-22T17:41:48.467Z] 
[2022-06-22T17:41:48.467Z]         boot_mode {
[2022-06-22T17:41:48.467Z]             manage_mode     = true
[2022-06-22T17:41:48.467Z]             mode            = "UEFI"
[2022-06-22T17:41:48.467Z]             pxe_boot_policy = "Auto"
[2022-06-22T17:41:48.467Z]             secure_boot     = "Unmanaged"
[2022-06-22T17:41:48.467Z]         }
[2022-06-22T17:41:48.467Z] 
[2022-06-22T17:41:48.467Z]         connection_settings {
[2022-06-22T17:41:48.467Z]             reapply_state = "NotApplying"
[2022-06-22T17:41:48.467Z] 
[2022-06-22T17:41:48.467Z]             connections {
[2022-06-22T17:41:48.467Z]                 allocated_mbps         = 0
[2022-06-22T17:41:48.467Z]                 allocated_vfs          = 64
[2022-06-22T17:41:48.467Z]                 function_type          = "Ethernet"
[2022-06-22T17:41:48.467Z]                 id                     = 1
[2022-06-22T17:41:48.467Z]                 interconnect_port      = 0
[2022-06-22T17:41:48.467Z]                 interconnect_uri       = "/rest/interconnects/ea007447-19d6-4d4e-b07a-082066033ee7"
[2022-06-22T17:41:48.467Z]                 isolated_trunk         = false
[2022-06-22T17:41:48.467Z]                 mac_type               = "Physical"
[2022-06-22T17:41:48.467Z]                 managed                = true
[2022-06-22T17:41:48.467Z]                 maximum_mbps           = 0
[2022-06-22T17:41:48.467Z]                 name                   = "G1_Network_1"
[2022-06-22T17:41:48.467Z]                 port_id                = "Mezz 3:1-a"
[2022-06-22T17:41:48.467Z]                 private_vlan_port_type = "Unknown"
[2022-06-22T17:41:48.467Z]                 requested_mbps         = "0"
[2022-06-22T17:41:48.467Z]                 state                  = "Reserved"
[2022-06-22T17:41:48.467Z]                 status                 = "Disabled"
[2022-06-22T17:41:48.467Z] 
[2022-06-22T17:41:48.467Z]                 boot {
[2022-06-22T17:41:48.467Z]                     boot_vlan_id       = 0
[2022-06-22T17:41:48.467Z]                     ethernet_boot_type = "PXE"
[2022-06-22T17:41:48.468Z]                     priority           = "Secondary"
[2022-06-22T17:41:48.468Z]                 }
[2022-06-22T17:41:48.468Z]             }
[2022-06-22T17:41:48.468Z]             connections {
[2022-06-22T17:41:48.468Z]                 allocated_mbps         = 1000
[2022-06-22T17:41:48.468Z]                 allocated_vfs          = 64
[2022-06-22T17:41:48.468Z]                 function_type          = "Ethernet"
[2022-06-22T17:41:48.468Z]                 id                     = 2
[2022-06-22T17:41:48.468Z]                 interconnect_port      = 0
[2022-06-22T17:41:48.468Z]                 interconnect_uri       = "/rest/interconnects/5ced55c6-19fe-4b0e-ace4-5e57e1710e1d"
[2022-06-22T17:41:48.468Z]                 isolated_trunk         = false
[2022-06-22T17:41:48.468Z]                 mac_type               = "Physical"
[2022-06-22T17:41:48.468Z]                 managed                = true
[2022-06-22T17:41:48.468Z]                 maximum_mbps           = 10000
[2022-06-22T17:41:48.468Z]                 name                   = "G1_Network_2"
[2022-06-22T17:41:48.468Z]                 network_uri            = "/rest/ethernet-networks/bbd09473-9abb-4af3-8540-e48d718a077e"
[2022-06-22T17:41:48.468Z]                 port_id                = "Mezz 3:2-a"
[2022-06-22T17:41:48.468Z]                 private_vlan_port_type = "None"
[2022-06-22T17:41:48.468Z]                 requested_mbps         = "1000"
[2022-06-22T17:41:48.468Z]                 state                  = "Deployed"
[2022-06-22T17:41:48.468Z]                 status                 = "OK"
[2022-06-22T17:41:48.468Z] 
[2022-06-22T17:41:48.468Z]                 boot {
[2022-06-22T17:41:48.468Z]                     boot_vlan_id       = 0
[2022-06-22T17:41:48.468Z]                     ethernet_boot_type = "PXE"
[2022-06-22T17:41:48.468Z]                     priority           = "Primary"
[2022-06-22T17:41:48.468Z]                 }
[2022-06-22T17:41:48.468Z]             }
[2022-06-22T17:41:48.468Z]         }
[2022-06-22T17:41:48.468Z] 
[2022-06-22T17:41:48.468Z]         firmware {
[2022-06-22T17:41:48.468Z]             consistency_state      = "Unknown"
[2022-06-22T17:41:48.468Z]             force_install_firmware = false
[2022-06-22T17:41:48.468Z]             manage_firmware        = false
[2022-06-22T17:41:48.468Z]             reapply_state          = "NotApplying"
[2022-06-22T17:41:48.468Z]         }
[2022-06-22T17:41:48.468Z] 
[2022-06-22T17:41:48.468Z]         local_storage {
[2022-06-22T17:41:48.468Z]             reapply_state = "NotApplying"
[2022-06-22T17:41:48.468Z] 
[2022-06-22T17:41:48.468Z]             controller {
[2022-06-22T17:41:48.468Z]                 device_slot              = "Embedded"
[2022-06-22T17:41:48.468Z]                 drive_write_cache        = "Unmanaged"
[2022-06-22T17:41:48.468Z]                 import_configuration     = false
[2022-06-22T17:41:48.468Z]                 initialize               = false
[2022-06-22T17:41:48.468Z]                 mode                     = "Mixed"
[2022-06-22T17:41:48.468Z]                 predictive_spare_rebuild = "Unmanaged"
[2022-06-22T17:41:48.468Z] 
[2022-06-22T17:41:48.468Z]                 logical_drives {
[2022-06-22T17:41:48.468Z]                     accelerator         = "Unmanaged"
[2022-06-22T17:41:48.468Z]                     bootable            = false
[2022-06-22T17:41:48.468Z]                     drive_number        = 1
[2022-06-22T17:41:48.468Z]                     name                = "Logical Drive 1"
[2022-06-22T17:41:48.468Z]                     num_physical_drives = 2
[2022-06-22T17:41:48.468Z]                     raid_level          = "RAID1"
[2022-06-22T17:41:48.468Z]                 }
[2022-06-22T17:41:48.468Z]             }
[2022-06-22T17:41:48.468Z]         }
[2022-06-22T17:41:48.468Z] 
[2022-06-22T17:41:48.468Z]         management_processor {
[2022-06-22T17:41:48.468Z]             manage_mp = false
[2022-06-22T17:41:48.468Z]         }
[2022-06-22T17:41:48.468Z]     }
[2022-06-22T17:41:48.468Z] 
[2022-06-22T17:41:48.468Z] Plan: 0 to add, 1 to change, 0 to destroy.
module "EG06F01B11" {
  source                = "git::https://git@stash.example.com/scm/sys/terraform-modules.git//sp_netmapping?ref=1.0.0"
  profile_name          = "SP_EG${var.enclosure_group}_FRAME01_B11"
  profile_template      = "SPT_EG${var.enclosure_group}_TradingDesktops_Gen10_1"
  environment           = "prod"
  server_function       = "tradingdesktop"
  create_raid           = "no"
}
variable "enclosure_group" {
  default = "06"
}

relevant sections in sp_netmapping:

locals {
  enclosuregroup_no = substr(var.profile_name,5,2)
  frame = substr(var.profile_name,8,7)
  bay = tonumber(substr(var.profile_name,17,2))
}

resource "oneview_server_profile" "default" {
  name = var.profile_name
  template = var.profile_template
  hardware_name = "EG${local.enclosuregroup_no}_${local.frame}, bay ${local.bay}"
  wwn_type = "Physical"
  mac_type = "Physical"
  enclosure_group = "EG_${local.enclosuregroup_no}"
  server_hardware_type = var.server_hardware_type
  serial_number_type = "Physical"
  type = var.profile_type
rismoney commented 2 years ago

I ran it on 6.6.0-13 and I am seeing same issue. Upgraded to TF 1.x too, and same thing. It'sabout 1/4 times executing.

nabhajit-ray commented 2 years ago

Hi @rismoney ,

Will look into this and get back to you.

nabhajit-ray commented 2 years ago

When the server profile was created what were the hardware name, enclosure group name? Can you share the tfstate file?
It seems you;re dynamically providing the values for hardware and enclosure group name, can you check that while you are creating/planning the resource it is giving the same values everytime?

rismoney commented 2 years ago

The module always provides a single static name, despite the variable use. The variable values never change between runs, and it always uses the default, and isn't specified. It's there more as a convention of easy copy/paste.

My folder directories are divided by enclosure groups. So folder EG06 never has any references to any servers in EG05. The state file for this directory never has any mentions to EG05. Yet somehow when the provider tries compare the resource to oneview and it returns bad data. As you see above, it returned an EG05 node and is trying to make it EG06. "EG_05" -> "EG_06" is an impossibility in my tf codebase.

But it only happens randomly. So if I run terraform plan, it works. Run it again without changing anything it works. Again, doesn't. Again doesn't. Then it does. Then it does. Nothing changes between these tf runs. It could also return the bad data on the first run, so its not related to a cached module directory or similar leftover artifact. It can happen from a clean clone.

I can see about sharing state somehow with you. Perhaps I can email it or allow you access to a repo. Let me inspect it.

Is it possible there is a bug in the api call that matches a server profile to a hardware name where it is not getting the right answer? I ran tf in trace mode and didn't see any mentions of the wrong EG. So something is happening that I can't pinpoint based on the run.

nabhajit-ray commented 2 years ago

Since it is not reproducible everytime, we are trying to get the environment ready to reproduce first. Meanwhile will wait for the state file. Regarding your question, yes we are search the resource by name and we are checking if there is any issue with that method.

rismoney commented 2 years ago

i have given you both access to my repo.

nabhajit-ray commented 2 years ago

Sorry missed the invitation and now it is expired. Can you please send it again?

rismoney commented 2 years ago

were you able to get in?

nabhajit-ray commented 2 years ago

Yes, we are able to access it now. Will check and get back to you

nabhajit-ray commented 2 years ago

When we do a read operation we first do a GET call by using the server profile name. While retreiving that if we get the wrong server profile , then this scenario can happen. We have tried multiple time with name searches, but it always gave the correct server profile. Looking at the state file, we can see that ID field is correct so we are not able to debug it at present. Will keep looking and may ask for more details.

rismoney commented 2 years ago

sure. if you need me to compile or build some test code, around outputting the underlying GET calls, that might help isolate it to the underlying go oneview library?

I am not a golang programmer, but perhaps that might eliminate this provider's direct usage of that, and indicate a problem in the getter? Perhaps there is an error condition or parsing problem being swallowed and then its returning improper data.

nabhajit-ray commented 2 years ago

Will get back to you on this soon.

rismoney commented 2 years ago

I am planning a synergy composer1 to compose2 upgrade, and a subsequent oneview upgrade to 7.x so will continue to track if this is fixed somehow in any of that effort.

nabhajit-ray commented 1 year ago

Closing this ,since we did not hear from you if it is still giving issue. Please open another defect if there is any issue.

rismoney commented 1 year ago

Would you be able to provide some go code I can compile to perform the get call using the server profile name? I'd like to try and repro this outside of terraform, and perform like 100 queries against oneview. Something is still randomly retrieving the wrong server profile. In powershell I have run 100s of Get-OVServerProfile -name "profilename" and can't repro... I am not sure what else to do. I am not sure how else to instrument this to get the wrong server profile.