Error in calculation that leads to out of bounds memory write.

Hello. I hit a strange behavior when my app crashes and it took me a while to figure out what's going on.

I think everything starts in DOGM::gridCellOccupancyUpdate function. Here the accumulate(weight_array, weights_accum); function is executed which calls thrust::inclusive_scan. The thrust documentation says:

inclusive_scan is similar to std::partial_sum in the STL. The primary difference between the two functions is that std::partial_sum guarantees a serial summation order, while inclusive_scan requires associativity of the binary operation to parallelize the prefix sum.

Results are not deterministic for pseudo-associative operators (e.g., addition of floating-point types). Results for pseudo-associative operators may vary from run to run.

After the inclusive scan it can happen that weights_accum array has NON-INCREASING values, i.e. weights_accum[i] < weights_accum[i - 1], in other words, the next value in the array can be greater than the previous. Later this array is used in gridCellPredictionUpdateKernel kernel. Because weight_array_accum has the non-increasing numbers, the line

m_occ_pred = subtract(weight_array_accum, start_idx, end_idx);

can return a negative value. I can observe this with simple printf in the kernel. The negative m_occ_pred can lead to a negative rho_b value (also observed by printing the value), which is stored in born_masses_array.

Later in DOGM::initializeNewParticles() the array born_masses_array is used in inclusive scan to update particle_orders_accum, which is later used in normalize_particle_orders function. Inside this function it is assumed that the last value of the array is the maximum and the normalization happens:

void normalize_particle_orders(float* particle_orders_array_accum, int particle_orders_count, int v_B)
{
    thrust::device_ptr<float> particle_orders_accum(particle_orders_array_accum);

    float max = 1.0f;
    cudaMemcpy(&max, &particle_orders_array_accum[particle_orders_count - 1], sizeof(float), cudaMemcpyDeviceToHost);
    thrust::transform(
        particle_orders_accum, particle_orders_accum + particle_orders_count, particle_orders_accum,
        GPU_LAMBDA(float x) { return x * (v_B / max); });
}

But because born_masses_array has negative values, it can happen that the last value of particle_orders_array_accum is NOT maximum. This can lead that after the thrust::transform, the particle_orders_array_accum array can have values, greater than v_B (which is new_born_particle_count).

Later the array particle_orders_array_accum is passed to the initNewParticlesKernel1. Inside the kernel the start_idx is calculated, which, as mentioned, can be greater than new_born_particle_count (observed with printf inside the kernel). Next, this wrong index is used to update the birth_particle_array and this causes write to out-of-bounds write - compute sanitizer complains in this kernel and later a thrust::vector throws an error, because it's designed to throw in a destructor.

The problem is that all this is difficult to reproduce in a small demo, but in our bigger application it crashes all the time. Atm, I'm simply checking rho_b and set it to 0 in case it's negative.

TheCodez / dynamic-occupancy-grid-map

Error in calculation that leads to out of bounds memory write. #99