amd / amd_hsmp

AMD HSMP module to provide user interface to system management features.
27 stars 4 forks source link

versions using ioctl is not possible to insert to the kernel #7

Open vysocky opened 1 year ago

vysocky commented 1 year ago

We have two system using the same CentOS 7 with old kernel (3.10.0-1160.80.1.el7.x86_64). One of the systems works perfectly with the master version of the amd_hsmp, however the other one (2x7763 + 8x A100 per node) fails when inserting the kernel module. It results in the following error:

insmod: ERROR: could not insert module amd_hsmp/4276242/amd_hsmp.ko: No such device

I have tested older versions of the code, and found out that only the very old version without ioctl interface (5d590b22fa29f5e65c311f7711218038fcd784eb) is possible to insert, and works fine.

Do you have a clue why the ioctl versions do not work? Thanks @nchatrad

nchatrad commented 1 year ago

Can you tell us the Fam/Model number of the system where this driver does not work.

vysocky commented 1 year ago

As mentioned in the initial text, it is a system powered by two 7763 CPUs with Nvidia GPUs. Let me know what additional information you need.

nchatrad commented 1 year ago

Hi Vysocky, 7763 should be Milan, did you check if HSMP interface is enabled in the BIOS. Note: BIOS config options vary among ODMs. Can you share the dmesg logs around the error while inserting.