bshoshany / thread-pool

BS::thread_pool: a fast, lightweight, and easy-to-use C++17 thread pool library
MIT License
2.21k stars 253 forks source link

Performance very slow #127

Closed diffproject closed 12 months ago

diffproject commented 1 year ago

Hello,

I wanted to parallelize some code with BS::thread_pool that is structured something like this

inline void step(const vector<double>& c, const vector<double>& Kv, vector<double> &dc1)
{
      //updates dc1 based on c and kv
}

int main()
{
    const int nthread=4;
    BS::thread_pool pool(nthread);
    vector<vector<double>> c1(nthread, vector<double>(chunk*N,0));
    vector<vector<double>> dc(nthread, vector<double>(chunk*N,0));
    vector<vector<double>> kv(nthread, vector<double>(N*N,0));

    int t=0; const int dt=0.001; 
    while(t<500)
    {
        // coagulation and fragmentation
        pool.push_loop(nthread, [&](const int a, const int b){
            for(int tt=a; tt<b; tt++)
            {
                step(c1[tt],kv[tt],dc[tt]);
            }
        });
        pool.wait_for_tasks();
        t+=dt;
        // do some other stuff here

  }
}

But this code is running orders of magnitude slower than openmp. I wanted to get rid of the overhead of launching openmp threads as the function step is very short. Can you please let me know what might be going wrong here? I can post the full code if needed.

I am running this on visual studio 2022 with all the optimization flags enabled in the documentation.

Sorry if this issue has already been posted before, I could not find it in closed issues.

bshoshany commented 12 months ago

Thanks for opening this issue! I'm closing it because it does not appear to be a bug in the thread pool itself, but rather a performance issue with a specific algorithm that uses the thread pool. If you want, I'm happy to take a look at your code - please post a minimal working example here ("working" means it will compile as is without needing to make any changes), including two versions, one using OpenMP and one using the thread pool. When I have time, I will compare the two and let you know what I think.