While investigating the slowdowns from newer HIP versions I found that the eigensolver miniapp fails immediately on the first iteration (consistently) with:
terminate called after throwing an instance of 'whip::exception'
what(): invalid argument
srun: error: nid006104: task 0: Segmentation fault
srun: launch/slurm: _step_signal: Terminating StepId=4970716.56
slurmstepd: error: *** STEP 4970716.56 ON nid006104 CANCELLED AT 2023-11-20T22:16:08 ***
srun: error: nid006104: tasks 1-7: Terminated
srun: Force Terminated StepId=4970716.56
I have not investigated this at all. Some miniapps are clearly fine (miniapp_bt_band_to_tridiag) while others fail for some configurations or always (miniapp_bt_reduction_to_band, miniapp_band_to_tridiag). Others were not tested at this point. The core dumps contain nothing useful, so this will need more thorough investigating.
While investigating the slowdowns from newer HIP versions I found that the eigensolver miniapp fails immediately on the first iteration (consistently) with:
I have not investigated this at all. Some miniapps are clearly fine (
miniapp_bt_band_to_tridiag
) while others fail for some configurations or always (miniapp_bt_reduction_to_band
,miniapp_band_to_tridiag
). Others were not tested at this point. The core dumps contain nothing useful, so this will need more thorough investigating.