Cray-HPE / gru

A utility for reading and modifying BMCs (e.g. iLO, RMMC) using RedFish (gofish).
MIT License
6 stars 2 forks source link

boot commands fail on HPE/iLO #40

Open rustydb opened 10 months ago

rustydb commented 10 months ago
SUMMARY

When invoking either of the following against an iLO machine, 404 errors are returned.

Found against the following node

{
  "ncn-m003-mgmt": {
    "biosVersion": "A43 v2.68 (02/06/2023)",
    "firmwareVersion": "iLO 5 v2.81",
    "processorModel": "AMD EPYC 7302P 16-Core Processor               ",
    "manufacturer": "HPE",
    "model": "ProLiant DL325 Gen10 Plus"
  }
}
ISSUE TYPE
STEPS TO REPRODUCE
  1. Find an iLO machine
  2. Run gru show boot
EXPECTED RESULTS

A list of boot options should print out

intel-mgmt-node:
    Order:
        CRAY UEFI OS 1
        CRAY UEFI OS 0
        UEFI Samsung Flash Drive 1100
        UEFI HTTPv6: Network 00 at Riser 03 Slot 01
        UEFI IPv4: Network 00 at Riser 03 Slot 01
        UEFI HTTPv4: Network 00 at Riser 03 Slot 01
        UEFI IPv6: Network 00 at Riser 03 Slot 01
        UEFI HTTPv6: Network 01 at Riser 03 Slot 01
        UEFI IPv4: Network 01 at Riser 03 Slot 01
        UEFI HTTPv4: Network 01 at Riser 03 Slot 01
        UEFI IPv6: Network 01 at Riser 03 Slot 01
        UEFI HTTPv6: Network 00 at Riser 03 Slot 02
        UEFI IPv4: Network 00 at Riser 03 Slot 02
        UEFI HTTPv4: Network 00 at Riser 03 Slot 02
        UEFI IPv6: Network 00 at Riser 03 Slot 02
        UEFI HTTPv6: Network 01 at Riser 03 Slot 02
        UEFI IPv4: Network 01 at Riser 03 Slot 02
        UEFI HTTPv4: Network 01 at Riser 03 Slot 02
        UEFI IPv6: Network 01 at Riser 03 Slot 02
        UEFI HTTPv6: Intel Network 00 at Baseboard
        UEFI IPv4: Intel Network 00 at Baseboard
        UEFI HTTPv4: Intel Network 00 at Baseboard
        UEFI IPv6: Intel Network 00 at Baseboard
        UEFI HTTPv6: Intel Network 01 at Baseboard
        UEFI IPv4: Intel Network 01 at Baseboard
        UEFI HTTPv4: Intel Network 01 at Baseboard
        UEFI IPv6: Intel Network 01 at Baseboard
        UEFI SAMSUNG MZ7LH480HAHQ-00005 S45PNA0M839302
        Launch EFI Shell
        Enter Setup
        Boot Device List
        Network Boot
        UEFI SAMSUNG MZ7LH480HAHQ-00005 S45PNA0M839290
ACTUAL RESULTS

A 404 error is returned.

jacobsalmela commented 10 months ago

I think it is here: https://github.com/Cray-HPE/gru/blob/main/pkg/cmd/cli/chassis/boot/show.go#L90

Getting this endpoint needs to be more robust. It tries at times to get

/redfish/v1/Systems/1/BootOptions/0002

but it is

/redfish/v1/Systems/1/BootOptions/2

it can also be inaccurate if you just trim the zeroes. Maybe go-redfish has some improvements in this area. At the time, I had to get get the endpoint manually

rustydb commented 8 months ago

I think it is here: https://github.com/Cray-HPE/gru/blob/main/pkg/cmd/cli/chassis/boot/show.go#L90

Getting this endpoint needs to be more robust. It tries at times to get

/redfish/v1/Systems/1/BootOptions/0002

but it is

/redfish/v1/Systems/1/BootOptions/2

it can also be inaccurate if you just trim the zeroes. Maybe go-redfish has some improvements in this area. At the time, I had to get get the endpoint manually

@jacobsalmela are you saying that iLO/HPE can behave either way?

I ran your branch against one of our nodes and received a 404 error even with the trimmed zeros:

[662]rusty@HPE-XHD22YD7DW:~/gitstuffs/cray-shasta/gru> ./gru --insecure show boot surtur-ncn-m001-mgmt.hpc.amslabs.hpecorp.net
Asynchronously querying [    1] hosts ...
surtur-ncn-m001-mgmt.hpc.amslabs.hpecorp.net:
    Order:
    Error                                                       : 404: {"error":{"code":"iLO.0.10.ExtendedInfo","message":"See @Message.ExtendedInfo for more information.","@Message.ExtendedInfo":[{"MessageArgs":["/redfish/v1/Systems/1/BootOptions/13"],"MessageId":"Base.1.4.ResourceMissingAtURI"}]}}
jacobsalmela commented 8 months ago

Yes, this branch/fix is incomplete but it started with it picking the wrong endpoint (BootOptions/0002 instead of BootOptions/2).

I think I had some code in place to do our own URL manipulation, since go-fish wasn't doing it either. Maybe that is different now that the branch has been sitting for some time.

We may just need a larger set of varying hardware to run this against and make appropriate simulator/RIE changes to account for them going forward.