SCOREC / pumi-pic

support libraries for unstructured mesh particle in cell simulations on GPUs and CPUs
BSD 3-Clause "New" or "Revised" License
36 stars 15 forks source link

buildSCSTest fails on certain occassions #53

Closed diamog closed 3 years ago

diamog commented 3 years ago

Error output for test:

6: Test timeout computed to be: 10000000
6: 
6: Beginning DefaultTest
6: Building SCS with C: 2 sigma: 2147483647 V: 2
6: 
6: Particle Structures Sell-C-Sigma C: 2 sigma: 2147483647 V: 2.
6: Number of Elements: 5.
6: Number of Particles: 20.
6: Number of Chunks: 3.
6: Number of Slices: 6.
6:   Chunk 0. Elements: 3 2
6:     Slice 0 | 1 1 | 1 1
6:     Slice 1 | 1 1 | 1 1
6:     Slice 2 | 1 1 | 1 1
6:     Slice 3 | 1 1 | 1 1
6:     Slice 4 | 1 1 | 1 0
6:     Slice 5 | 1 0
6: 
6: Beginning NoSort Test
6: Building SCS with C: 2 sigma: 1 V: 2
6: :0: : block: [1,0,0], thread: [0,0,0] Assertion `View bounds error of view particle_mask` failed.
6: :0: : block: [1,0,0], thread: [0,1,0] Assertion `View bounds error of view particle_mask` failed.
6: terminate called after throwing an instance of 'std::runtime_error'
6:   what():  cudaDeviceSynchronize() error( cudaErrorAssert): device-side assert triggered /home/diamog/Documents/scorec/kokkos/core/src/Cuda/Kokkos_Cuda_Instance.cpp:120
6: Traceback functionality not available
6: 
1/1 Test #6: buildSCS .........................Child aborted***Exception:   0.31 sec

First noticed on commit 61bc5cc0712ca3af9b8527a4c7da9344c76fa5f7

MatthewChristoff commented 3 years ago

I noticed this issue while I was doing my testing. I was able to track down the location of this bug to an out-of-bounds error on this reference to the active mask.

diamog commented 3 years ago

I've tracked the issue to occur when the first chunk of the SCS is empty. Working on a fix now

cwsmith commented 3 years ago

I assume full sorting must be disabled for that to happen?

diamog commented 3 years ago

Yes, the error in the test is only occurring in the no sort test when all particles are places in the later elements. This is why the error seemed like a race condition as it was dependent on the random assignment of particles.

diamog commented 3 years ago

The above commit fixes the issue when any number of the first chunks are empty.

We also have a new getFirstValue(view) function... Maybe I should have just made a getIndex function, but oh well.