The parallel efficiency of the function "cacheNeighborInfo" in "Particle"

Hi everyone，

I have tested the parallel efficiency of the function cacheNeighborInfo used by the function fillNeighborsCPU

When the number of particles exceeds one million, cacheNeighborInfo in 32 cores consumes even longer time than that in 8 cores CPU. I wonder if it is because the function cacheNeighborInfo are written in OpenMP way instead of MPI. The same thing happens in the function Redistribute. Here is part of codes in cacheNeighborInfo:

  int thread_num = OpenMP::get_thread_num(); 
  const int& grid = pti.index();
  const int& tile = pti.LocalTileIndex();
  PairIndex src_index(grid, `tile);

I would be appreciate if you can give me some help.

Best wishes, Yifeng He

AMReX-Codes / amrex

The parallel efficiency of the function "cacheNeighborInfo" in "Particle" #3827