amd / amd_hsmp

AMD HSMP module to provide user interface to system management features.
27 stars 4 forks source link

probe of amd_hsmp failed with error -16 #8

Open tbarbette opened 6 days ago

tbarbette commented 6 days ago

Hi all,

This module does not seem to work, though HSMP is enabled in BIOS.

This is the dmesg I get.

[  408.427141] amd_hsmp amd_hsmp: HSMP test message failed on Fam:19 model:11
[  408.427178] amd_hsmp amd_hsmp: Is HSMP disabled in BIOS ?
[  408.427198] amd_hsmp amd_hsmp: Failed to init HSMP mailbox
[  408.427216] amd_hsmp: probe of amd_hsmp failed with error -16

Any idea?

Thanks

tbarbette commented 6 days ago

Actually, it might work once. I ran e_smi_tool after loading and got everything:

============================= E-SMI ===================================

--------------------------------------
| CPU Family        | 0x19 (25 ) |
| CPU Model     | 0x11 (17 ) |
| NR_CPUS       | 32         |
| NR_SOCKETS        | 2          |
| THREADS PER CORE  | 1 (SMT OFF)|
--------------------------------------

------------------------------------------------------------------------
| Sensor Name            | Socket 0         | Socket 1         |
------------------------------------------------------------------------
| Energy (K Joules)      | 51.111           | 55.334           |
| Power (Watts)          | 29.075           | 31.189           |
| PowerLimit (Watts)         | 200.000          | 200.000          |
| PowerLimitMax (Watts)      | 240.000          | 240.000          |
| C0 Residency (%)       | 2                | 0                |
| DDR Bandwidth          |                  |                  |
|   DDR Max BW (GB/s)    | 307              | 307              |
|   DDR Utilized BW (GB/s)   | 0                | 0                |
|   DDR Utilized Percent(%)  | 0                | 0                |
| Current Active Freq limit  |                  |                  |
|    Freq limit (MHz)    | 1800             | 2600             |
|    Freq limit source   | Refer below[*0]  | Refer below[*1]  |
| Socket frequency range     |                  |                  |
|    Fmax (MHz)      | 3700             | 3700             |
|    Fmin (MHz)      | 400              | 400              |
------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU energies in Joules:                                           |
| cpu [  0] :     41.496      9.325      8.889      8.266     10.964      8.328      8.108      9.661       |
| cpu [  8] :      8.371      8.094      7.695      7.457      9.916      8.008      7.966     10.628       |
| cpu [ 16] :      5.560      5.597      5.598      5.292      5.308      5.309      5.308      5.297       |
| cpu [ 24] :      5.124      5.110      5.113      5.111      5.281      5.251      5.271      5.255       |
-----------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU boostlimit in MHz:                                            |
| cpu [  0] : 3700  3700  3700  3700  NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 16] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
-----------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU core clock current frequency limit in MHz:                                            |
| cpu [  0] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 16] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
-----------------------------------------------------------------------------------------------------------------
*0 Frequency limit source names:
 OPN Max

*1 Frequency limit source names:
 OPN Max

Try `./e_smi_tool --help' for more information.

============================= End of E-SMI ============================

And then further calls show this :

============================= E-SMI ===================================

Error in initialising HSMP version sepcific info, Only energy data can be obtained...
Err[3]: HSMP driver not present

--------------------------------------
| CPU Family        | 0x19 (25 ) |
| CPU Model     | 0x11 (17 ) |
| NR_CPUS       | 32         |
| NR_SOCKETS        | 2          |
| THREADS PER CORE  | 1 (SMT OFF)|
--------------------------------------

------------------------------------------------------------------------
| Sensor Name            | Socket 0         | Socket 1         |
------------------------------------------------------------------------
| Energy (K Joules)      | 51.112           | 56.061           |
| Power (Watts)          | NA (Err: 20)     | NA (Err: 20)     |
| PowerLimit (Watts)         | NA (Err: 20)     | NA (Err: 20)     |
| PowerLimitMax (Watts)      | NA (Err: 20)     | NA (Err: 20)     |
| C0 Residency (%)       | NA (Err: 20)     | NA (Err: 20)     |
------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU energies in Joules:                                           |
| cpu [  0] :     41.496      9.325      8.889      8.267     10.966      8.328      8.108      9.662       |
| cpu [  8] :      8.371      8.094      7.695      7.457      9.916      8.008      7.966     10.628       |
| cpu [ 16] :      5.564      5.599      5.599      5.294      5.309      5.310      5.309      5.298       |
| cpu [ 24] :      5.125      5.111      5.113      5.111      5.282      5.252      5.272      5.256       |
-----------------------------------------------------------------------------------------------------------------

-----------------------------------------------------------------------------------------------------------------
| CPU boostlimit in MHz:                                            |
| cpu [  0] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
| cpu [ 16] : NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA      |
-----------------------------------------------------------------------------------------------------------------

Err[20]: HSMP message/command not supported

Try `./e_smi_tool --help' for more information.

============================= End of E-SMI ============================

After which even unloading and re-loading hsmp does not work.

sumachidanand commented 7 hours ago

Hi,

Error -16 indicates that SMU is busy. Can you let us know following details?

  1. The platform which you are using
  2. BIOS version
  3. Is there any other heavy task running when you are running e_smi?