Set HSA_XNACK for sollve/OpenMP_VV for more GPUs

jtb20 commented 3 weeks ago

At present, several SOLLVE_VV/OpenMP_VV tests for unified shared memory (USM) require the HSA_XNACK=1 setting to work correctly on GPUs that support the feature, but which are running in "xnack-" mode.

The run_sollve.sh/run_OpenMP_VV.sh scripts currently set the environment variable for gfx90a devices only. This patch lets the USM tests run on additional GPU types also, via querying the list of GPUs that support USM in the test/Makefile.defs fragment.

The situation on APUs is a bit more complicated: if we enable HSA_XNACK on those then the USM tests pass, but several SOLLVE_VV/OpenMP_VV tests that are written to rely on a distinct address space for target regions (with in/out copies) regress because we then get "zero-copy" behaviour (i.e. target regions unexpectedly affect the "host versions" of data).

So, for now, we do not set HSA_XNACK on APU devices, perhaps until the above tests are adjusted to allow for "zero-copy" mode.

doru1004 commented 3 weeks ago

I am not sure if you are worried about detecting the APU or not but what I did for the check-offload tests is to have an env var IS_APU which is set when the tests are run so that IF the GPU arch is gfx942 (it's the only case in which it may or may not be an APU), to tell the test suite if I am running on an APU or not. This avoids some complicated scheme to detect the APU.

This is because either the user running the tests or the CI running the tests will know if the underlying arch is an APU or not.

doru1004 commented 3 weeks ago

Have you tried to run the unified memory tests on an APU without the HSA_XNACK=1 flag?

spophale commented 3 weeks ago

Just curious, which tests are failing on APUs with HSA_XNACK enabled ?

jtb20 commented 3 weeks ago

@spophale These ones fail on an APU with HSA_XNACK enabled:

julbrown@pp-128-b1-2:~/work$ diff -u testresults-nopatch/sollve.sum testresults/sollve.sum | grep "^\+FAIL"
+FAIL: tests/5.0/application_kernels/lsms_triangular_packing.cpp run
+FAIL: tests/5.0/target/test_target_defaultmap_none.c run
+FAIL: tests/5.0/target/test_target_defaultmap_to_from_tofrom.c run
+FAIL: tests/5.0/target_teams_distribute_parallel_for/test_target_teams_distribute_parallel_for_collapse.c run
+FAIL: tests/5.0/target_teams_distribute_parallel_for_simd/test_target_teams_distribute_parallel_for_simd_atomic.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_and.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_bitand.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_bitor.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_bitxor.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_eqv.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_multiply.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_neqv.F90 run
+FAIL: tests/5.0/target_teams_distribute/test_target_teams_distribute_reduction_or.F90 run
+FAIL: tests/5.0/teams_loop/test_target_teams_loop_defaultmap.c run
+FAIL: tests/5.1/memory_routines/test_get_mapped_ptr.c run
+FAIL: tests/5.1/target/test_target_declare_indirect.c run
+FAIL: tests/5.1/target/test_target_defaultmap_present.c run
+FAIL: tests/5.1/target_update/test_target_update_to_present.c run
+FAIL: tests/5.2/target_enter_data/test_target_enter_data_map.c run

ROCm / aomp

Set HSA_XNACK for sollve/OpenMP_VV for more GPUs #1052