BoringBoredom / Linpack-Extended

Linpack Extended is a stress test for 64-bit Intel processors. It is based on the Intel Math Kernel Library.
15 stars 1 forks source link

How did you do the AMD compatibility? #3

Closed sp00n closed 2 months ago

sp00n commented 2 months ago

I wonder how you arrived at the hex values you needed to change to disable the Genuine Intel check in the exe file? I'm currently looking at the intel_xeon64.exe file from the latest MKL package, but the hex values that are being changed in your exe do not appear in that new file.

So I wonder how you located them in the first place.

BoringBoredom commented 2 months ago

https://github.com/valleyofdoom/StresKit/blob/494d11837375a3d5901502795e2321bbe4125495/build.py#L57

In IDA image image image replace with mov eax, 1

sp00n commented 2 months ago

Phew, not so easy to get familiar with disassembling and IDA if you've never really done it before. I was eventually able to retrace your steps though.

However, for the new Intel Linpack, it doesn't seem to use the GenuineIntel string anymore. It's still physically there in the binary, but not showing up as a string anymore in the .rdata section. And also doesn't seem to be used at all, as while playing around, I found out that I was able to simply replace the GenuineIntel string with AuthenticAMD, which would then make the binary run on an AMD processor (but of course not on Intel anymore). But this didn't work for the newest binary, which indicates to me that they've now somehow changed the check to something else.

Out of curiosity I also compared the GFlops running 1 thread from the binary you're using with the one from Linpack Xtreme. The old binary from Linpack Xtreme runs with slightly more GFlops if it's set to MKL_DEBUG_CPU_TYPE=5 (AVX2) (and slower with MKL_DEBUG_CPU_TYPE=4 (AVX)), so I'm not sure right now if using the new binary is beneficial as a stress test for AMD at all right. And Intel had removed that environment variable some time ago (in 2020 I believe).

The results were around 58 GFlops (yours) vs. 65 (MKL_DEBUG_CPU_TYPE=5) and 45 (MKL_DEBUG_CPU_TYPE=4).

Maybe an HPL binary with AMD's AOCL library would run even better, but that's another endeavor that I would need to wrap my head around first.

I'll see if I can find a way to trick the new binary as well, now that I at least know a little bit what I'm doing. 😬

sp00n commented 2 months ago

Okay, that was easy now (I hope?). 😏 At offset 0x533 I replaced the hex values E8 38 4F with B8 01 00.

I.e. call sub_140006070 with mov eax, 1

image

No idea if this is the correct way to do it, but it seems to work. Performance is very slightly higher on AMD than in your version, 59 GFlops vs. 58, and the same on Intel. It's still lower than in the old binary though.

BoringBoredom commented 2 months ago

I don't have an AMD CPU, so I can't test, but it seems like Linpack isn't a good choice for AMD in general anyway. BTW, I'm 2 versions behind because the latest packages don't include the 64-bit libiomp5md.dll. I assume it's an oversight, but I honestly cba to report it, as big companies generally have an absolutely horrible pipeline for (bug) reports.

sp00n commented 2 months ago

I did find the libiomp5md.dll "hidden" inside the \packages\intel.oneapi.win.openmp,v=2024.2.0+978\cupPayload.cup archive and then there the _installdir\compiler\2024.2\bin\ directory. (I never installed oneApi, only extracted it, not sure how the regular directory structure would look like).

Regarding AMD, I was able to collect a few versions of Linpack now, and those before 2020 do run a bit faster, when you were still able to set the MKL_DEBUG_CPU_TYPE environment variable.

BoringBoredom commented 2 months ago

https://www.intel.com/content/www/us/en/developer/articles/technical/onemkl-benchmarks-suite.html This is where I'm getting the binaries from. The last 2 packages only have the 32-bit libiomp5md.dll.

Have you compared those old binaries to something like P95, OCCT or y-cruncher?

sp00n commented 2 months ago

Huh, didn't know of this page. I had downloaded the whole package at https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-download.html?operatingsystem=windows&windows-install=offline

The linpack benchmark binary is then inside packages\intel.oneapi.win.mkl.devel,v=2024.2.0+661\cupPayload.cup and _installdir\mkl\2024.2\share\mkl\benchmarks\linpack\ (yeah, took me a while to find it...) and the .dll as mentioned above.

Linpack Xtreme's binary for Intel CPUs is bit-identical to the 2018.3.011 linpack version I was able to find here: http://web.archive.org/web/20220412021802/registrationcenter-download.intel.com/akdlm/irc_nas/9752/w_mklb_p_2018.3.011.zip For its AMD binary I don't know, it doesn't display any version number and its size is different. No idea what it actually is.

OCCT includes various linpack versions (extracted into the /temp/OCCT/CPULINPACK directory on program start), and they all seem to patched original versions as well (at least their "Original filename" property says linpack_xeon64.exe). Except for the AMD64 variant, which is possibly the original HPL code (from 2018) compiled with the AMD libraries(?). It runs relatively slow though when executed outside of OCCT (which hides the linpack output).

I'm using the stress tests to test single-core stability only, so I haven't compared all-core load scenarios for all of them. Temperature-wise, they 2024 and the 2018 version don't seem to differ too much when only testing on one or two threads, even if the GFlops are lower for 2024.

BoringBoredom commented 2 months ago

I meant comparing Linpack to other popular stress tests on AMD. If, e.g., P95 or y-cruncher are more stressful, I don't see a reason to use Linpack.

sp00n commented 2 months ago

All of them have different use cases. Linpack Xtreme is still regarded as one of the hardest stress tests, at least up to Ryzen 5000 series. But that's for multi core stress tests, and since my focus is on single core, I actually don't have any experience. I don't think anybody tried to use Linpack for single core stability testing so far, but it's a stress test that reports back a failed test, and I can capture its output (with some help of powershells Tee-Object cmdlet), so I though I will give it a try.

As far as community experience regarding single core testing goes, y-Cruncher with 19-ZN2 ~ Kagari seems to have proven quite effective in quickly reporting errors for Ryzen CPUs with really unstable Curve Optimizer (CO) values. And for testing stability in low-load conditions, Prime95 with SSE and huge FFTs seems to be pretty good. It takes a long time to identify these last kind of errors though.