benczaja / EnerBe

Energy Benchmarks
0 stars 0 forks source link

Memory access fault on AMD Instinct MI210 #5

Open benczaja opened 5 months ago

benczaja commented 5 months ago

Memory access fault by GPU node-2 (Agent handle: 0x6982a0) on address 0xf0000. Reason: Unknown.

This happens only on the MI210 inside LIZA, with dgemm_gpu.hip

benczaja commented 5 months ago

My feeling is that there is something going wrong with omp (the library i linked to at compile time vs the library I linked to at runtime, EESSI does not make my life EESSI)