asmjit / cult

CPU Ultimate Latency Test.
Other
105 stars 15 forks source link

Problem with CPU feature identification #3

Closed jsmall-zzz closed 7 years ago

jsmall-zzz commented 7 years ago

First - I love this project :) Having a way to find the perf characteristics for a system by running a tool, as opposed to looking up in a doc, and having to trust what's there is great.

I think there may be a problem within the CPU detection code. I'm testing on i7 3820. If I run cult it crashes on vfmadd132pd instruction.

If I look up that instruction on intel intrinsics website - it says it needs the FMA flag. If I lookup that CPU on cpuboss it claims that processor does not have FMA3.

The detection code detects FMA (presumably FMA3) is available.

If I look at this CPU id code https://github.com/klauspost/cpuid/blob/master/cpuid.go

The identification code is...

if c&(1<<26) != 0 && c&(1<<27) != 0 && c&(1<<28) != 0 { // Check for OS support eax, _ := xgetbv(0) if (eax & 0x6) == 0x6 { rval |= AVX if (c & 0x00001000) != 0 { rval |= FMA3 } } }

The code in cult is...

// Detect AVX+.
if (regs.ecx & 0x10000000U) {
  // - XCR0[2:1] == 11b
  //   XMM & YMM states need to be enabled by OS.
  if ((xcr0.eax & 0x00000006U) == 0x00000006U) {
    cpuInfo->addFeature(CpuInfo::kX86FeatureAVX);

    if (regs.ecx & 0x00004000U) cpuInfo->addFeature(CpuInfo::kX86FeatureFMA);
    if (regs.ecx & 0x20000000U) cpuInfo->addFeature(CpuInfo::kX86FeatureF16C);
  }
}

so perhaps the line

if (regs.ecx & 0x00004000U)

should be

if (regs.ecx & 0x00001000U)

jsmall-zzz commented 7 years ago

If I comment out the FMA detection line it crashes out on

vpavgb

Which require AVX512VL + AVX512BW which also don't seem available on this chip. If I disable the AVX feature being able cult completes without crash (but of course without AVX instructions).

kobalicek commented 7 years ago

Both should be fixed, thanks for the report!

AsmDB commit: https://github.com/asmjit/asmdb/commit/57809f6cba5b03e4a8e057a08cae5ea8d250bc4d

AsmJit commit: https://github.com/asmjit/asmjit/commit/3864b255e97d04e4ee54a3e315440266ef77f837

Let me know if it fixes your issues.

kobalicek commented 7 years ago

And thanks for testing it out. You can view instruction database and some preliminary results here:

https://kobalicek.com/asmgrid/

jsmall-zzz commented 7 years ago

Just tested latest and can confirm the problem is fixed.

https://kobalicek.com/asmgrid/ looks very handy.

From a usability point of view I find myself constantly looking to the A/B listing on the left. Perhaps having the a shortened processor name as a column heading spanning two sub column headings for Lat/Rcp underneath that heading might help with that.

I've run cult now on a few machines - might be nice to have a way to upload, and/or have an option within cult itself to upload when profile is completed automatically.

dumblob commented 7 years ago

I've run cult now on a few machines - might be nice to have a way to upload, and/or have an option within cult itself to upload when profile is completed automatically.

There should be a copyright waiver as @kobalicek is planning to make the profiles public domain (as explained in the TODO section of the cult README).

kobalicek commented 7 years ago

@jsmall-nvidia I would love to have group header in the table header, but unfortunately the component I'm using (SlickGrid) doesn't provide that. Until I find a good replacement I would just keep it this way.

If you have interesting outputs I can add it to the view, but I wanted to improve cult a bit to handle more instructions before I start collecting the data.

Anyway thanks for letting me know about these, I'm definitely considering all of the requests except automatic uploads of data, this I will keep as is.

kobalicek commented 7 years ago

@jsmall-nvidia I found a plugin for SlickGrid that provides the grouping, the app now groups by CPU.