Hello. I hit a strange behavior when my app crashes and it took me a while to figure out what's going on.
I think everything starts in DOGM::gridCellOccupancyUpdate function. Here the accumulate(weight_array, weights_accum); function is executed which calls thrust::inclusive_scan. The thrust documentation says:
inclusive_scan is similar to std::partial_sum in the STL. The primary difference between the two functions is that std::partial_sum guarantees a serial summation order, while inclusive_scan requires associativity of the binary operation to parallelize the prefix sum.
Results are not deterministic for pseudo-associative operators (e.g., addition of floating-point types). Results for pseudo-associative operators may vary from run to run.
After the inclusive scan it can happen that weights_accum array has NON-INCREASING values, i.e. weights_accum[i] < weights_accum[i - 1], in other words, the next value in the array can be greater than the previous. Later this array is used in gridCellPredictionUpdateKernel kernel. Because weight_array_accum has the non-increasing numbers, the line
can return a negative value. I can observe this with simple printf in the kernel. The negative m_occ_pred can lead to a negative rho_b value (also observed by printing the value), which is stored in born_masses_array.
Later in DOGM::initializeNewParticles() the array born_masses_array is used in inclusive scan to update particle_orders_accum, which is later used in normalize_particle_orders function. Inside this function it is assumed that the last value of the array is the maximum and the normalization happens:
void normalize_particle_orders(float* particle_orders_array_accum, int particle_orders_count, int v_B)
{
thrust::device_ptr<float> particle_orders_accum(particle_orders_array_accum);
float max = 1.0f;
cudaMemcpy(&max, &particle_orders_array_accum[particle_orders_count - 1], sizeof(float), cudaMemcpyDeviceToHost);
thrust::transform(
particle_orders_accum, particle_orders_accum + particle_orders_count, particle_orders_accum,
GPU_LAMBDA(float x) { return x * (v_B / max); });
}
But because born_masses_array has negative values, it can happen that the last value of particle_orders_array_accum is NOT maximum. This can lead that after the thrust::transform, the particle_orders_array_accum array can have values, greater than v_B (which is new_born_particle_count).
Later the array particle_orders_array_accum is passed to the initNewParticlesKernel1. Inside the kernel the start_idx is calculated, which, as mentioned, can be greater than new_born_particle_count (observed with printf inside the kernel). Next, this wrong index is used to update the birth_particle_array and this causes write to out-of-bounds write - compute sanitizer complains in this kernel and later a thrust::vector throws an error, because it's designed to throw in a destructor.
The problem is that all this is difficult to reproduce in a small demo, but in our bigger application it crashes all the time. Atm, I'm simply checking rho_b and set it to 0 in case it's negative.
Hello. I hit a strange behavior when my app crashes and it took me a while to figure out what's going on.
I think everything starts in DOGM::gridCellOccupancyUpdate function. Here the
accumulate(weight_array, weights_accum);
function is executed which callsthrust::inclusive_scan
. Thethrust
documentation says:After the inclusive scan it can happen that
weights_accum
array has NON-INCREASING values, i.e.weights_accum[i] < weights_accum[i - 1]
, in other words, the next value in the array can be greater than the previous. Later this array is used in gridCellPredictionUpdateKernel kernel. Becauseweight_array_accum
has the non-increasing numbers, the linem_occ_pred = subtract(weight_array_accum, start_idx, end_idx);
can return a negative value. I can observe this with simple
printf
in the kernel. The negativem_occ_pred
can lead to a negativerho_b
value (also observed by printing the value), which is stored inborn_masses_array
.Later in DOGM::initializeNewParticles() the array
born_masses_array
is used in inclusive scan to updateparticle_orders_accum
, which is later used in normalize_particle_orders function. Inside this function it is assumed that the last value of the array is the maximum and the normalization happens:But because
born_masses_array
has negative values, it can happen that the last value ofparticle_orders_array_accum
is NOT maximum. This can lead that after thethrust::transform
, theparticle_orders_array_accum
array can have values, greater thanv_B
(which isnew_born_particle_count
).Later the array
particle_orders_array_accum
is passed to the initNewParticlesKernel1. Inside the kernel the start_idx is calculated, which, as mentioned, can be greater thannew_born_particle_count
(observed withprintf
inside the kernel). Next, this wrong index is used to update the birth_particle_array and this causes write to out-of-bounds write - compute sanitizer complains in this kernel and later athrust::vector
throws an error, because it's designed to throw in a destructor.The problem is that all this is difficult to reproduce in a small demo, but in our bigger application it crashes all the time. Atm, I'm simply checking
rho_b
and set it to 0 in case it's negative.