Donders-Institute / PRESTUS

PREprocessing & Simulations for Transcranial Ultrasound Stimulation package
GNU General Public License v3.0
15 stars 10 forks source link

CUDA errors #50

Open jkosciessa opened 1 month ago

jkosciessa commented 1 month ago

I encounter occasional CUDA errors during acoustic simulations. I find this error hard to debug, because a comparable simulation in CPU mode appears to run without problems.

{Error using .*
Encountered unexpected error during CUDA execution. The CUDA error was:
CUDA_ERROR_ILLEGAL_ADDRESS

Error in kspaceFirstOrder3D (line 958)
                source_mat = real(ifftn(source_kappa .* fftn(source_mat)));

Error in run_simulations (line 59)
       sensor_data = kspaceFirstOrder3D(kgrid, medium, source, sensor, input_args_cell{:});

Error in single_subject_pipeline (line 272)
        sensor_data = run_simulations(kgrid, kwave_medium, source, sensor, kwave_input_args, parameters);

Error in tp50cc77e7_df67_41ac_b927_e85810643c23 (line 1)
load /project/2424103.01/thalstim_simulations/thalstim_sim/data/tussim/CTX500-026-010_79.6mm_pCT_60W/sub-002/batch_job_logs/tp55f45bc7_f0a5_4e23_9e69_49e5aece910b.mat; cd /project/2424103.01/thalstim_simulations/thalstim_sim/tools/PRESTUS; single_subject_pipeline(subject_id, parameters); delete /project/2424103.01/thalstim_simulations/thalstim_sim/data/tussim/CTX500-026-010_79.6mm_pCT_60W/sub-002/batch_job_logs/tp55f45bc7_f0a5_4e23_9e69_49e5aece910b.mat; delete /project/2424103.01/thalstim_simulations/thalstim_sim/data/tussim/CTX500-026-010_79.6mm_pCT_60W/sub-002/batch_job_logs/tp50cc77e7_df67_41ac_b927_e85810643c23.m;
} 
jkosciessa commented 1 month ago

This may have something to do with the change to cuda version 8. Switching back to cudacap 5 runs jobs without CUDA errors, and also the occurence of NaNs during acoustic sims may be fixed (https://github.com/Donders-Institute/PRESTUS/issues/48).

By chance, the following cuda 8 GPU gave me errors even during water simulations. Perhaps it is broken? Don't know why DeviceAvailable=False given that the GPU was part of the job.

Screenshot 2024-08-13 at 10 53 23