bshoshany / thread-pool

BS::thread_pool: a fast, lightweight, and easy-to-use C++17 thread pool library
MIT License
2.21k stars 253 forks source link

[REQ] Getting thread ids #126

Closed klytje closed 1 year ago

klytje commented 1 year ago

I think it'd be useful to be able to get the thread ids of the threads currently in the pool. This would allow us to use thread-local data with a simple std::unordered_map<std::thread::id, T>.

For me this would be useful for evaluating and binning Euclidian distances. Currently I am allocating a new result vector for every job, and then summing all of them at the end. If I had the thread ids, I could instead allocate only a single result vector for each thread, and then sum these at the end. This would dramatically reduce the time spent allocating and summing vectors.

Technically this should already be possible to do by submitting jobs getting the std::this_thread::get_id() once for each thread, but I am not sure how to guarantee that all threads run this job at least once without excessive looping or waiting.

This can be implemented by adding the following to both header files:

    [[nodiscard]] std::vector<std::thread::id> get_thread_ids() {
        std::vector<std::thread::id> thread_ids;
        for (unsigned int i = 0; i < thread_count; ++i){
            thread_ids.push_back(threads[i].get_id());
        }
        return thread_ids;
    }
bshoshany commented 1 year ago

Thanks for opening this issue, @klytje! A similar feature is actually already implemented in v3.6.0, which should be released within a week or two. A new member function get_native_handles() allows you to get a vector of the native handles for each thread. That's generally more useful, because there isn't much you can actually do in standard C++ with the built-in thread IDs.

The downside is that the native handles are not guaranteed to be easily hashable for use in an associative container (although I don't really see why they shouldn't be). However, I am also introducing the option to have a thread initialization function that runs exactly once on each thread when the thread pool is constructed, which should allow you to get the thread IDs manually as you suggested.

For this release I actually considered defining a built-in std::unordered_map for the thread IDs that maps each one to the pool's own thread index, but now I'm thinking that just defining my own version of std::this_thread::get_id() would probably be easier, maybe even using thread_local.

klytje commented 1 year ago

Yes I read both of those issues before opening this one, actually. I just thought that using handles or manually extracting the thread ids upon creation seemed like a round-about way of doing something quite simple. I suppose you're right that there's not a whole lot you can do with the thread ids except what I'm trying to accomplish here.

bshoshany commented 1 year ago

In v3.6.0 there will be a function BS::get_thread_index() (I just wrote it today) that allows you to get the index of this thread in the pool, i.e. a number from 1 to get_thread_count() (I start at 1 because 0 means you're in a thread that's not in the pool, but I'm not sure about that, it might change by the time the new version is released). This is much faster and more convenient and than using an std::unordered_map with the value of std::this_thread::get_id(). You can just use a simple array instead. Therefore I'm not sure there's any point in returning the values of std::this_thread::get_id() for each thread in the pool.

However, if you think there is some other use for it that BS::get_thread_index() cannot provide, please let me know what that use is, and I will consider adding a get_thread_ids() function.