Mellanox / mstflint

Mstflint - an open source version of MFT (Mellanox Firmware Tools)
Other
166 stars 90 forks source link

mstlink cannot query the temperature of a specific network card on the server (it works with the network card name, but not with the LID #972

Open xiaolongzhou123 opened 1 month ago

xiaolongzhou123 commented 1 month ago

mlxlink -d lid-0x1cb -p 1 works. Compiling mlxlink (mstlink) from https://github.com/Mellanox/mstflint does not work.

image

HarelKarni commented 1 month ago

Please provide additional information like: What is device you are trying to use for repro? What is the mstflint version are you using?

xiaolongzhou123 commented 1 month ago

I downloaded and compiled from: https://github.com/Mellanox/mstflint No matter which tag version I use, or the latest version, it doesn't work.

  1. autogen.sh
  2. /configure --prefix=/opt/jl --enable-adb-generic-tools

root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:/opt/jl/bin# ./mstlink -d lid-0x01e5 -p 10/2 -m Segmentation fault (core dumped) root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:/opt/jl/bin# ./mstlink -v mstlink, mstflint 4.28.0, Git SHA Hash: cc30ec Executing the command using mlxlink works without errors:

mlxlink -d lid-0x01e5 -p 10/2 -m

root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:~/c/mstflint# ofed_info -s MLNX_OFED_LINUX-23.10-2.1.3.1:

root@p-jn-sz-cw-h1-su2-gpu31-402-22c-12u-208-65:~/c/mstflint# mlxlink -v mlxlink, mft 4.26.1-3, built on Nov 27 2023, 15:26:06. Git SHA Hash: N/A

image image

xiaolongzhou123 commented 1 month ago

image

xiaolongzhou123 commented 1 month ago

@HarelKarni

MlxRegLib::isAccessRegisterSupported, calling status = get_icmd_query_cap(mf, &icmd_cap);, directly causes a Segmentation fault (core dumped).

The GDB debugging information is as follows: image

Please have an expert take a look: can it run after compilation? Are there any other necessary software or libraries required? Is support available?

HarelKarni commented 1 month ago

@xiaolongzhou123 As I understand from out infrastructure team, this issue was resolved in the newest version (master_devel). Can you check please and let me know?

xiaolongzhou123 commented 1 month ago

@HarelKarni master_devel branch ,It’s still giving an error. It doesn't work.

Could it be that I need to install some dependencies first? I have tested compiling with multiple versions. All of them give the same error. But mlxlink doesn't give an error. It might be executed through Lid.

However, when I compile mstlink -d lid-xxx, it always gives an error, and I have no idea why.

So, I suspect it might be a configuration issue... I'm about to cry.

xiaolongzhou123 commented 1 month ago

image