arcress0 / ipmiutil

ipmiutil is an easy to use set of IPMI server management utilities. It can get/set sensor readings & thresholds, automate SEL management, do SOL console, etc. Supports Linux, Windows, BSD, Solaris, MacOSX. The only IPMI project tool that runs natively on Windows. See http://ipmiutil.sf.net for rpms, etc. (formerly called panicsel). It can run driverless in Linux for use on boot media or embedded environments.
BSD 3-Clause "New" or "Revised" License
33 stars 5 forks source link

add SDR conflict 0xC5 handling retries with delay #2

Closed albertlav closed 4 years ago

albertlav commented 4 years ago

Current SDR conflict handling is not optimal, it performs only one retry and without any delay

This change will improve SDR conflict handling, it will perform several retries with pseudorandom delay.

SDR conflict as per intel IPMI specification document can be caused by any other parallel Reserve SDR Repository activity:

https://www.intel.com/content/dam/www/public/us/en/documents/product-briefs/second-gen-interface-spec-v2-rev1-3.pdf


33.11.2 Reservation Cancellation
The SDR Repository Device shall automatically cancel the present SDR Repository reservation after any of the
following events occur:
• An SDR record is added using the Add SDR command such that other Record IDs change. As a
simplification, an implementation is allowed to cancel the reservation on any SDR record add.
• An SDR record is deleted such that other Record IDs change. As a simplification, an implementation
is allowed to cancel the reservation on any SDR record deletion.
• The SDR Repository is cleared.
• The SDR Repository Device is reset (via hardware or Cold Reset command)
• A new ‘Reserve SDR Repository’ command is received. <----
An error completion code will be returned if an attempt is made to execute a command that requires a
reservation ID, but the reservation ID used is not valid or current. 

Due to this 2 or more parallel sensor query activity that issue Reserve SDR may cause reservation conflicts i.e. 3rd party monitoring software querying host sensors may cause reservation conflicts and ipmiutil.exe will exit with 0xC5 (197)

how to reproduce: in windows in two separate PowerShell consoles run two tight loops: PS C:\Users\Administrator\Desktop\ipmiutil-3.1.6-win64> while (1) {.\ipmiutil.exe sensor; if ($LASTEXITCODE -ne 0){$LASTEXITCODE;break;}} one of the loops will exit with 197 and actual sensor data will be missing.

There is a workaround with Jumpstart -j option (dump sdr to file and reuse) but it is not always convenient for all use cases.

============= ipmitool handles conflicts in this way - retries with delay https://github.com/ipmitool/ipmitool/blob/dfe17311d6bbf76d43bdc82dee2d579ac20f0645/lib/ipmi_sdr.c#L964