BallisticLA / RandBLAS

A header-only C++ library for sketching in randomized linear algebra
https://randblas.readthedocs.io/en/latest/
Other
64 stars 4 forks source link

Added correctness tests for RNGState #65

Closed kaiwenhe7 closed 12 months ago

kaiwenhe7 commented 1 year ago

@rileyjmurray Something interesting is that the number of threads used in the for loop is not necessarily equal to omp_get_num_threads(). This is why I have a second array thrd_arr, which is initialized to an array of zeros of size omp_get_num_threads(). Afterwards, I find the largest thread, which may be less than or equal to omp_get_num_threads().

In addition, based on observation, the max counter has always been associated with the largest thread. However, I am not sure if this is absolutely true.

rileyjmurray commented 12 months ago

@kaiwenhe7 I left one minor comment on your code; please fix when you get a chance. Also, please use informative branch names rather than making all PRs from Kaiwen-branch.

If you want to indicate that you "own" a branch then it's best to (1) make a fork of RandBLAS, then (2) set your fork's main branch to point to the official RandBLAS main branch, then (3) use informative branch names on your fork, and finally (4) make PRs from a given branch of your work into the main branch of the official RandBLAS repo.

Regarding

@rileyjmurray Something interesting is that the number of threads used in the for loop is not necessarily equal to omp_get_num_threads(). This is why I have a second array thrd_arr, which is initialized to an array of zeros of size omp_get_num_threads(). Afterwards, I find the largest thread, which may be less than or equal to omp_get_num_threads().

In addition, based on observation, the max counter has always been associated with the largest thread. However, I am not sure if this is absolutely true.

Interesting indeed! I agree that we should not assume the max counter is always associated with the largest thread. That might be true in our implementation but minor changes could break that assumption in the future.

kaiwenhe7 commented 12 months ago

@rileyjmurray I pushed the requested changes. There is one more change I want to make that avoid the need to use an array that keeps track of the number of threads that the for loop uses. It simply initializes all entries of the ctr_arr to have the value of the starting counter. I am just getting an error with the fix not passing the tests. If you would like we can push the current changes and I can make another pull request with the new implementation.

rileyjmurray commented 12 months ago

@kaiwenhe7 go ahead and make the changes to this PR. We aren't in a rush to merge it, so better to take care of everything at once.

kaiwenhe7 commented 12 months ago

@rileyjmurray I have the reimplementation completed! In regards to why the number of threads used in the for loop is different than omp_get_num_threads(), when the matrix is small, the overhead cost of using all the threads is larger than just using a few threads. So openmp makes a good decision of using less threads for more speed.