UCI-CARL / CARLsim4

CARLsim is an efficient, easy-to-use, GPU-accelerated software framework for simulating large-scale spiking neural network (SNN) models with a high degree of biological detail.
http://www.socsci.uci.edu/~jkrichma/CARLsim
MIT License
82 stars 86 forks source link

Compartments CPU (release) vs GPU (release) Spike Times Deviation #45

Open staslist opened 7 years ago

staslist commented 7 years ago

The issue was revelead by Compartments spikeTimesCPUvsGPU test. The issue occurs at some, but not all timesteps.

CPU + release, CPU + debug, and GPU + debug modes all produce the same spike times when running compartment models. The GPU + release mode, however, deviaties from the other three models.

Overview of the issue: The GPU release mode seems to suffer from occassional calculation 'errors' that slowly snowball out of control. The cause of these calculation 'errors' has not been determined.

More in depth look: The deviation is first evident in regard to voltage values then recovery & current variables. The deviation slowly increases in size, eventually resulting in deviation between spike times. The deviation in voltage values begins early on (within first 100ms).

The GPU release voltage values deviate from GPU debug voltage. The CPU release voltage and CPU debug voltage do not deviate.

Print statements in GPU release mode seem to affect the reported values. This is not the case in GPU debug mode.

Attempted: Equating different optimization flags between debug & release modes for both C/C++ and CUDA C/C++.

https://devtalk.nvidia.com/default/topic/551571/different-results-when-using-gpu-debug-option-g-/?offset=6

https://www.researchgate.net/post/Debug_mode_VS_release_mode_in_visual_studio

https://devtalk.nvidia.com/default/topic/670121/release-and-debug-modes-on-cuda-5-0/

Disabling FMAD: https://stackoverflow.com/questions/14552576/disabling-fused-multiply-add-in-cuda-under-visual-studio-2010

Potential Causes: 1) Race condition? 2) ???

nmsutton commented 1 year ago

@staslist an update on this is that I am not seeing any spikeTimesCPUvsGPU test fail in "test/compartments.cpp" or "test/stp.cpp" in CARLsim6 using Ubuntu 21.10 with an Intel 12900KS CPU and NVidia RTX 3090 GPU (CUDA 11.4). Perhaps this means this bug has been resolved in CARLsim6? If the test is still failing in CARLsim4 then this issue could stay open for that version.

bainro commented 1 year ago

I'm just going thru old CARLsim issues, and just as a coincidence I currently debugging this in CARLsim6. We hadn't had this compartments.cpp in the test suite since CARLsim4 and if you add it back in, like I'm trying to do in CARLsim6, it will FAIL tests:

image

The tolerance of spike time differences is 1ms, but the worst I can see is 3ms, and most are 2ms... Going to continue looking into it :)