RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.64k stars 226 forks source link

[BUG] Likwid Stethoscope mode for NVIDIA GPUs #622

Open EMBagheri opened 3 months ago

EMBagheri commented 3 months ago

Describe the bug The stethoscope mode for NVIDIA GPUs skips with the following error. I use the master branch V 5.3.0 with enabled NVIDIA interface.

To Reproduce

Here is the output of likwid-perfctr -G 0 -W FLOPS_DP -S 1s -V 3:

DEBUG - [hwloc_init_cpuInfo:359] HWLOC CpuInfo Family 25 Model 1 Stepping 1 Vendor 0x0 Part 0x0 isIntel 0 numHWThreads 128 activeHWThreads 128 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 0 Thread 0 Core 0 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 1 Thread 0 Core 1 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 2 Thread 0 Core 2 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 3 Thread 0 Core 3 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 4 Thread 0 Core 4 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 5 Thread 0 Core 5 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 6 Thread 0 Core 6 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 7 Thread 0 Core 7 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 8 Thread 0 Core 8 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 9 Thread 0 Core 9 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 10 Thread 0 Core 10 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 11 Thread 0 Core 11 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 12 Thread 0 Core 12 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 13 Thread 0 Core 13 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 14 Thread 0 Core 14 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 15 Thread 0 Core 15 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 16 Thread 0 Core 16 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 17 Thread 0 Core 17 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 18 Thread 0 Core 18 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 19 Thread 0 Core 19 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 20 Thread 0 Core 20 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 21 Thread 0 Core 21 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 22 Thread 0 Core 22 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 23 Thread 0 Core 23 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 24 Thread 0 Core 24 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 25 Thread 0 Core 25 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 26 Thread 0 Core 26 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 27 Thread 0 Core 27 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 28 Thread 0 Core 28 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 29 Thread 0 Core 29 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 30 Thread 0 Core 30 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 31 Thread 0 Core 31 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 32 Thread 0 Core 32 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 33 Thread 0 Core 33 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 34 Thread 0 Core 34 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 35 Thread 0 Core 35 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 36 Thread 0 Core 36 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 37 Thread 0 Core 37 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 38 Thread 0 Core 38 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 39 Thread 0 Core 39 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 40 Thread 0 Core 40 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 41 Thread 0 Core 41 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 42 Thread 0 Core 42 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 43 Thread 0 Core 43 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 44 Thread 0 Core 44 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 45 Thread 0 Core 45 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 46 Thread 0 Core 46 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 47 Thread 0 Core 47 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 48 Thread 0 Core 48 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 49 Thread 0 Core 49 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 50 Thread 0 Core 50 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 51 Thread 0 Core 51 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 52 Thread 0 Core 52 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 53 Thread 0 Core 53 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 54 Thread 0 Core 54 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 55 Thread 0 Core 55 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 56 Thread 0 Core 56 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 57 Thread 0 Core 57 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 58 Thread 0 Core 58 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 59 Thread 0 Core 59 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 60 Thread 0 Core 60 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 61 Thread 0 Core 61 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 62 Thread 0 Core 62 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 63 Thread 0 Core 63 Die 0 Socket 0 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 64 Thread 0 Core 64 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 65 Thread 0 Core 65 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 66 Thread 0 Core 66 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 67 Thread 0 Core 67 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 68 Thread 0 Core 68 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 69 Thread 0 Core 69 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 70 Thread 0 Core 70 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 71 Thread 0 Core 71 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 72 Thread 0 Core 72 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 73 Thread 0 Core 73 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 74 Thread 0 Core 74 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 75 Thread 0 Core 75 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 76 Thread 0 Core 76 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 77 Thread 0 Core 77 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 78 Thread 0 Core 78 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 79 Thread 0 Core 79 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 80 Thread 0 Core 80 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 81 Thread 0 Core 81 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 82 Thread 0 Core 82 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 83 Thread 0 Core 83 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 84 Thread 0 Core 84 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 85 Thread 0 Core 85 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 86 Thread 0 Core 86 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 87 Thread 0 Core 87 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 88 Thread 0 Core 88 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 89 Thread 0 Core 89 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 90 Thread 0 Core 90 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 91 Thread 0 Core 91 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 92 Thread 0 Core 92 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 93 Thread 0 Core 93 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 94 Thread 0 Core 94 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 95 Thread 0 Core 95 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 96 Thread 0 Core 96 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 97 Thread 0 Core 97 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 98 Thread 0 Core 98 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 99 Thread 0 Core 99 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 100 Thread 0 Core 100 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 101 Thread 0 Core 101 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 102 Thread 0 Core 102 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 103 Thread 0 Core 103 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 104 Thread 0 Core 104 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 105 Thread 0 Core 105 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 106 Thread 0 Core 106 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 107 Thread 0 Core 107 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 108 Thread 0 Core 108 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 109 Thread 0 Core 109 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 110 Thread 0 Core 110 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 111 Thread 0 Core 111 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 112 Thread 0 Core 112 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 113 Thread 0 Core 113 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 114 Thread 0 Core 114 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 115 Thread 0 Core 115 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 116 Thread 0 Core 116 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 117 Thread 0 Core 117 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 118 Thread 0 Core 118 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 119 Thread 0 Core 119 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 120 Thread 0 Core 120 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 121 Thread 0 Core 121 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 122 Thread 0 Core 122 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 123 Thread 0 Core 123 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 124 Thread 0 Core 124 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 125 Thread 0 Core 125 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 126 Thread 0 Core 126 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_nodeTopology:568] HWLOC Thread Pool PU 127 Thread 0 Core 127 Die 0 Socket 1 inCpuSet 1 DEBUG - [hwloc_init_cacheTopology:798] HWLOC Cache Pool ID 0 Level 1 Size 32768 Threads 1 DEBUG - [hwloc_init_cacheTopology:798] HWLOC Cache Pool ID 1 Level 2 Size 524288 Threads 1 DEBUG - [hwloc_init_cacheTopology:798] HWLOC Cache Pool ID 2 Level 3 Size 33554432 Threads 8

CPU name: AMD EPYC 7713 64-Core Processor
CPU type: AMD K19 (Zen3) architecture CPU clock: 2.00 GHz CPU family: 25 CPU model: 1 CPU short: zen3 CPU stepping: 1 CPU features: FP MMX SSE SSE2 HTT MMX RDTSCP MONITOR SSSE FMA SSE4.1 SSE4.2 AES AVX RDRAND AVX2 RDSEED SSE3 CPU arch: x86_64

NVMON GPU 0 compute capability: 8.0 NVMON GPU 0 short: nvidia_gpu_cc_ge_7 NVMON GPU 1 compute capability: 8.0 NVMON GPU 1 short: nvidia_gpu_cc_ge_7 NVMON GPU 2 compute capability: 8.0 NVMON GPU 2 short: nvidia_gpu_cc_ge_7 NVMON GPU 3 compute capability: 8.0 NVMON GPU 3 short: nvidia_gpu_cc_ge_7 NVMON GPU 4 compute capability: 8.0 NVMON GPU 4 short: nvidia_gpu_cc_ge_7 NVMON GPU 5 compute capability: 8.0 NVMON GPU 5 short: nvidia_gpu_cc_ge_7 NVMON GPU 6 compute capability: 8.0 NVMON GPU 6 short: nvidia_gpu_cc_ge_7 NVMON GPU 7 compute capability: 8.0 NVMON GPU 7 short: nvidia_gpu_cc_ge_7

DEBUG - [nvmon_init:184] Device 0 runs with CUPTI Profiling API backend DEBUG - [nvmon_perfworks_createDevice:939] link_perfworks_libraries in createDevice DEBUG - [link_perfworks_libraries:443] LD_LIBRARY_PATH = ~/likwid:/apps/SPACK/0.19.1/opt/linux-almalinux8-zen/gcc-8.5.0/cuda-11.8.0-qrwohrlyazyilia56tybqfm5k3k6kiom/extras/CUPTI/lib64:/apps/SPACK/0.19.1/opt/linux-almalinux8-zen/gcc-8.5.0/cuda-11.8.0-qrwohrlyazyilia56tybqfm5k3k6kiom/lib64:/apps/SPACK/0.19.1/opt/linux-almalinux8-zen/gcc-8.5.0/cuda-11.8.0-qrwohrlyazyilia56tybqfm5k3k6kiom/extras/CUPTI/lib64:/apps/SPACK/0.19.1/opt/linux-almalinux8-zen/gcc-8.5.0/cuda-11.8.0-qrwohrlyazyilia56tybqfm5k3k6kiom/extras/CUPTI/lib64/:/apps/SPACK/0.19.1/opt/linux-almalinux8-zen/gcc-8.5.0/cuda-11.8.0-qrwohrlyazyilia56tybqfm5k3k6kiom/extras/CUPTI/lib64/: DEBUG - [link_perfworks_libraries:445] CUDA_HOME = /apps/SPACK/0.19.1/opt/linux-almalinux8-zen/gcc-8.5.0/cuda-11.8.0-qrwohrlyazyilia56tybqfm5k3k6kiom DEBUG - [link_perfworks_libraries:613] Run cuInit DEBUG - [link_perfworks_libraries:615] Run cuDeviceGetCount DEBUG - [link_perfworks_libraries:620] Run cuDeviceGet DEBUG - [link_perfworks_libraries:622] Run cuDeviceGetAttribute for major CC DEBUG - [link_perfworks_libraries:627] Run cuDeviceGetAttribute for minor CC DEBUG - [nvmon_perfworks_createDevice:955] Found 8 GPUs DEBUG - [nvmon_perfworks_createDevice:962] Current GPU 0 DEBUG - [nvmon_perfworks_createDevice:987] Current GPU chip GA100 DEBUG - [nvmon_perfworks_createDevice:1001] Create metric context for chip 'GA100' DEBUG - [nvmon_perfworks_createDevice:1005] Create metric context done DEBUG - [nvmon_perfworks_createDevice:1020] Create metric context getMetricNames DEBUG - [nvmon_perfworks_createDevice:1076] Destroy metric context getMetricNames DEBUG - [nvmon_perfworks_createDevice:1080] Destroy metric context DEBUG - [_nvml_linkLibraries:398] Init NVML Libaries DEBUG - [_nvml_linkLibraries:425] Init NVML Libaries Executing: DEBUG - [nvmon_addEventSet:556] Allocating new group structure for group. DEBUG - [nvmon_addEventSet:558] NVMON: Currently 1 groups of 2 active DEBUG - [nvmon_addEventSet:602] Performance group for PerfWorks backend DEBUG - [perfgroup_readGroup:873] Reading group FLOPS_DP from /apps/likwid/5.3.0/share/likwid/perfgroups/nvidia_gpu_cc_ge_7/FLOPS_DP.txt DEBUG - [nvmon_addEventSet:653] EventStr SMSP_SASS_THREAD_INST_EXECUTED_OP_DADD_PRED_ON_SUM:GPU0,SMSP_SASS_THREAD_INST_EXECUTED_OP_DMUL_PRED_ON_SUM:GPU1,SMSP_SASS_THREAD_INST_EXECUTED_OP_DFMA_PRED_ON_SUM:GPU2 DEBUG - [nvmon_addEventSet:671] Calling addevents DEBUG - [nvmon_perfworks_addEventSet:1739] Add events to GPU device 0 with context 29748656 DEBUG - [perfworks_check_nv_context:677] Current context 29748656 DevContext 0 DEBUG - [perfworks_check_nv_context:691] Reuse context 29748656 for device 0 DEBUG - [nvmon_perfworks_addEventSet:1769] SMSP_SASS_THREAD_INST_EXECUTED_OP_DADD_PRED_ON_SUM DEBUG - [nvmon_perfworks_addEventSet:1775] Adding real event smspsass_thread_inst_executed_op_dadd_pred_on.sum DEBUG - [nvmon_perfworks_addEventSet:1769] SMSP_SASS_THREAD_INST_EXECUTED_OP_DMUL_PRED_ON_SUM DEBUG - [nvmon_perfworks_addEventSet:1775] Adding real event smspsass_thread_inst_executed_op_dmul_pred_on.sum DEBUG - [nvmon_perfworks_addEventSet:1769] SMSP_SASS_THREAD_INST_EXECUTED_OP_DFMA_PRED_ON_SUM DEBUG - [nvmon_perfworks_addEventSet:1775] Adding real event smsp__sass_thread_inst_executed_op_dfma_pred_on.sum DEBUG - [nvmon_perfworks_addEventSet:1799] Increase size of eventSet space on device 0 DEBUG - [nvmon_perfworks_addEventSet:1812] Filling eventset 0 on device 0 DEBUG - [nvmon_perfworks_createConfigImage:1474] Create config image for chip GA100 DEBUG - [nvmon_perfworks_getMetricRequests114:1147] Create scratch buffer for GA100 and 0x5571b10 DEBUG - [nvmon_perfworks_getMetricRequests114:1161] Init Metric evaluator DEBUG - [nvmon_perfworks_getMetricRequests114:1275] Destroy Metric evaluator DEBUG - [nvmon_perfworks_createConfigImage:1476] Create config image for chip GA100 with 3 metric requests DEBUG - [nvmon_perfworks_createConfigImage:1570] Allocated 296 byte for configImage DEBUG - [nvmon_perfworks_createConfigImage:1580] nvmon_perfworks_createConfigImage_out enter 0 DEBUG - [nvmon_perfworks_createConfigImage:1582] NVPW_RawMetricsConfig_Destroy DEBUG - [nvmon_perfworks_createConfigImage:1586] NVPW_MetricsContext_Destroy DEBUG - [nvmon_perfworks_createConfigImage:1602] nvmon_perfworks_createConfigImage returns 296 DEBUG - [nvmon_perfworks_getMetricRequests114:1147] Create scratch buffer for GA100 and (nil) DEBUG - [nvmon_perfworks_getMetricRequests114:1161] Init Metric evaluator DEBUG - [nvmon_perfworks_getMetricRequests114:1275] Destroy Metric evaluator DEBUG - [nvmon_perfworks_createCounterDataPrefixImage:1679] Allocated 172 byte for configPrefixImage DEBUG - [nvmon_perfworks_createCounterDataPrefixImage:1691] nvmon_perfworks_createCounterDataPrefixImage_out enter 0 DEBUG - [nvmon_perfworks_createCounterDataPrefixImage:1716] nvmon_perfworks_createCounterDataPrefixImage returns 172 DEBUG - [nvmon_perfworks_addEventSet:1844] Filling eventset 0 on device 0 DEBUG - [nvmon_perfworks_addEventSet:1885] Adding eventset 0

~/likwid/ext/lua/lua: ~/likwid/likwid-perfctr:1496: attempt to compare nil with number stack traceback: ~/likwid/likwid-perfctr:1496: in main chunk [C]: in ?

if ret < 0 then  

    print_stderr(string.format("Error starting counters for cpu %d.", cpulist[ret * (-1)]))    

    perfctr_exit(1)    

end