TACC / perfexpert

An easy-to-use automatic performance diagnosis and optimization tool for HPC applications
http://www.tacc.utexas.edu/perfexpert/
Other
33 stars 9 forks source link

PerfExpert v4.1.1 ignores AVX floating-point instructions in reported total FP instrs #16

Open boegel opened 10 years ago

boegel commented 10 years ago

We noticed that PerfExpert was reporting 0% floating point instructions for a test program that was heavily using AVX FP instructions.

After looking into this with @leonardofialho, it turns out the ratio.floating_point defined in lcpi.conf is missing the SIMD_FP_256 events.

The following patch (post-installation) seems to fix the issue:

--- PerfExpert/4.1.1/etc/lcpi.conf.orig    2014-05-07 15:42:20.010888000 +0200
+++ PerfExpert/4.1.1/etc/lcpi.conf    2014-05-07 15:45:25.940577000 +0200
@@ -1,6 +1,6 @@
 # LCPI config generated using sniffer
 # version = 1.0
-ratio.floating_point = FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE / PAPI_TOT_INS
+ratio.floating_point = SIMD_FP_256:PACKED_SINGLE + SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE / PAPI_TOT_INS
 ratio.data_accesses = PAPI_LD_INS / PAPI_TOT_INS
 GFLOPS_(%_max).overall = ((SIMD_FP_256:PACKED_SINGLE*8 + (SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE)*4 + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE*2 + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) / PAPI_TOT_CYC) / 8
 GFLOPS_(%_max).packed = ((SIMD_FP_256:PACKED_SINGLE*8 + (SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE)*4 + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE*2) / PAPI_TOT_CYC) / 8
@@ -20,6 +20,6 @@
 branch_instructions.overall = (PAPI_BR_INS * BR_lat + PAPI_BR_MSP * BR_miss_lat) / PAPI_TOT_INS
 branch_instructions.correctly_predicted = PAPI_BR_INS * BR_lat / PAPI_TOT_INS
 branch_instructions.mispredicted = PAPI_BR_MSP * BR_miss_lat / PAPI_TOT_INS
-floating-point_instr.overall = (((FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) + (PAPI_FDV_INS * FP_slow_lat)) / PAPI_TOT_INS
+floating-point_instr.overall = (((SIMD_FP_256:PACKED_SINGLE + SIMD_FP_256:PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) + (PAPI_FDV_INS * FP_slow_lat)) / PAPI_TOT_INS
 floating-point_instr.slow_FP_instr = (PAPI_FDV_INS * FP_slow_lat) / PAPI_TOT_INS
 floating-point_instr.fast_FP_instr = ((FP_COMP_OPS_EXE:SSE_FP_PACKED_DOUBLE + FP_COMP_OPS_EXE:SSE_FP_SCALAR_SINGLE + FP_COMP_OPS_EXE:SSE_PACKED_SINGLE + FP_COMP_OPS_EXE:SSE_SCALAR_DOUBLE) * FP_lat) / PAPI_TOT_INS