Closed kaiwenhe7 closed 12 months ago
@kaiwenhe7 I left one minor comment on your code; please fix when you get a chance. Also, please use informative branch names rather than making all PRs from Kaiwen-branch
.
If you want to indicate that you "own" a branch then it's best to (1) make a fork of RandBLAS, then (2) set your fork's main
branch to point to the official RandBLAS main
branch, then (3) use informative branch names on your fork, and finally (4) make PRs from a given branch of your work into the main
branch of the official RandBLAS repo.
Regarding
@rileyjmurray Something interesting is that the number of threads used in the for loop is not necessarily equal to omp_get_num_threads(). This is why I have a second array thrd_arr, which is initialized to an array of zeros of size omp_get_num_threads(). Afterwards, I find the largest thread, which may be less than or equal to omp_get_num_threads().
In addition, based on observation, the max counter has always been associated with the largest thread. However, I am not sure if this is absolutely true.
Interesting indeed! I agree that we should not assume the max counter is always associated with the largest thread. That might be true in our implementation but minor changes could break that assumption in the future.
@rileyjmurray I pushed the requested changes. There is one more change I want to make that avoid the need to use an array that keeps track of the number of threads that the for loop uses. It simply initializes all entries of the ctr_arr to have the value of the starting counter. I am just getting an error with the fix not passing the tests. If you would like we can push the current changes and I can make another pull request with the new implementation.
@kaiwenhe7 go ahead and make the changes to this PR. We aren't in a rush to merge it, so better to take care of everything at once.
@rileyjmurray I have the reimplementation completed! In regards to why the number of threads used in the for loop is different than omp_get_num_threads(), when the matrix is small, the overhead cost of using all the threads is larger than just using a few threads. So openmp makes a good decision of using less threads for more speed.
@rileyjmurray Something interesting is that the number of threads used in the for loop is not necessarily equal to omp_get_num_threads(). This is why I have a second array thrd_arr, which is initialized to an array of zeros of size omp_get_num_threads(). Afterwards, I find the largest thread, which may be less than or equal to omp_get_num_threads().
In addition, based on observation, the max counter has always been associated with the largest thread. However, I am not sure if this is absolutely true.