Non-default streams for filling matrix

Tanvi141 commented 4 years ago

References to other Issues or PRs or Relevant literature

Fixes #2

Brief description of what is fixed or changed

Implementing filling matrix in adaboost::cuda::core by using non default streams. Number of streams is passed as a parameter to the function, and each row of matrix gets filled by one of the streams. The stream to fill is chosen in a round robin fashion.

Other comments

Initial code of filling using n streams was by @fiza11. @Tanvi141 worked on integrating that code into this code base as well as implementing round robin.

Tanvi141 commented 4 years ago

Commenting build and test reports here.

-- The CXX compiler identification is GNU 7.5.0
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - /usr/local/cuda-10.2/bin/nvcc
-- The CUDA compiler identification is NVIDIA 10.2.89
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/tanvi/OpenSource/AdaBoost/build-adaboost
Scanning dependencies of target adaboost_utils
Scanning dependencies of target adaboost_cuda_wrappers
Scanning dependencies of target adaboost_core
Scanning dependencies of target adaboost_cuda
[  5%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda_wrappers.dir/cuda/utils/cuda_wrappers_impl.cu.o
[ 11%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/core/data_structures_impl.cpp.o
[ 16%] Building CXX object adaboost/CMakeFiles/adaboost_utils.dir/utils/utils_impl.cpp.o
[ 22%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/core/operations_impl.cpp.o
[ 27%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/core/cuda_data_structures_impl.cu.o
[ 33%] Linking CXX shared library ../libs/libadaboost_utils.so
[ 38%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/core/operations_impl.cu.o
[ 38%] Built target adaboost_utils
[ 44%] Building CUDA object adaboost/CMakeFiles/adaboost_cuda.dir/cuda/utils/cuda_wrappers_impl.cu.o
[ 50%] Linking CUDA device code CMakeFiles/adaboost_cuda_wrappers.dir/cmake_device_link.o
[ 55%] Linking CUDA shared library ../libs/libadaboost_cuda_wrappers.so
[ 55%] Built target adaboost_cuda_wrappers
[ 61%] Building CXX object adaboost/CMakeFiles/adaboost_core.dir/utils/utils_impl.cpp.o
/home/tanvi/OpenSource/AdaBoost/adaboost/adaboost/cuda/core/operations_impl.cu(96): warning: 'long double' is treated as 'double' in device code

/home/tanvi/OpenSource/AdaBoost/adaboost/adaboost/cuda/core/operations_impl.cu(215): warning: 'long double' is treated as 'double' in device code

Warning: 'long double' is treated as 'double' in device code

Warning: 'long double' is treated as 'double' in device code

[ 66%] Linking CXX shared library ../libs/libadaboost_core.so
[ 66%] Built target adaboost_core
Scanning dependencies of target test_core
[ 72%] Building CXX object adaboost/CMakeFiles/test_core.dir/tests/test_core.cpp.o
[ 77%] Linking CXX executable ../bin/test_core
[ 77%] Built target test_core
[ 83%] Linking CUDA device code CMakeFiles/adaboost_cuda.dir/cmake_device_link.o
[ 88%] Linking CUDA shared library ../libs/libadaboost_cuda.so
[ 88%] Built target adaboost_cuda
Scanning dependencies of target test_cuda
[ 94%] Building CXX object adaboost/CMakeFiles/test_cuda.dir/tests/test_cuda.cpp.o
[100%] Linking CXX executable ../bin/test_cuda
[100%] Built target test_cuda
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from Core
[ RUN      ] Core.Vector
[       OK ] Core.Vector (0 ms)
[ RUN      ] Core.Matrices
[       OK ] Core.Matrices (0 ms)
[ RUN      ] Core.Sum
[       OK ] Core.Sum (0 ms)
[ RUN      ] Core.Argmax
[       OK ] Core.Argmax (0 ms)
[----------] 4 tests from Core (0 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 4 tests.
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from Cuda
[ RUN      ] Cuda.VectorGPU
[       OK ] Cuda.VectorGPU (51 ms)
[ RUN      ] Cuda.MatrixGPU
[       OK ] Cuda.MatrixGPU (41 ms)
[ RUN      ] Cuda.MatricesGPU
[       OK ] Cuda.MatricesGPU (311 ms)
[----------] 3 tests from Cuda (403 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (403 ms total)
[  PASSED  ] 3 tests.

czgdp1807 commented 4 years ago

Please doc strings as well. See the existing code for documentation style and similar docs for new functions.

Tanvi141 commented 4 years ago

@czgdp1807, is this ready to merge?

czgdp1807 commented 4 years ago

[==========] Running 3 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 3 tests from Cuda
[ RUN      ] Cuda.VectorGPU
[       OK ] Cuda.VectorGPU (50 ms)
[ RUN      ] Cuda.MatrixGPU
[       OK ] Cuda.MatrixGPU (3323 ms)
[ RUN      ] Cuda.MatricesGPU
[       OK ] Cuda.MatricesGPU (766 ms)
[----------] 3 tests from Cuda (4139 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test suite ran. (4139 ms total)
[  PASSED  ] 3 tests.
[==========] Running 4 tests from 1 test suite.
[----------] Global test environment set-up.
[----------] 4 tests from Core
[ RUN      ] Core.Vector
[       OK ] Core.Vector (1 ms)
[ RUN      ] Core.Matrices
[       OK ] Core.Matrices (0 ms)
[ RUN      ] Core.Sum
[       OK ] Core.Sum (0 ms)
[ RUN      ] Core.Argmax
[       OK ] Core.Argmax (0 ms)
[----------] 4 tests from Core (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test suite ran. (1 ms total)
[  PASSED  ] 4 tests.

czgdp1807 commented 4 years ago

Please use https://github.com/codezonediitj/utils/blob/master/create_template.py for creating template instantiations for function prototypes automatically.

codezonediitj / adaboost