This refactors a number of files to move (most) of the gpu
related code into the awnndevicelib. Note, there is still
some device code scattered through Tensor.h, and a few
other files. These files could be duplicated in the device
lib.
This PR additionally adds the 2D transpose and 2D gemm
cublas operations, and sets the CMakeLists.txt file up to
work correctly.
This refactors a number of files to move (most) of the gpu related code into the awnndevicelib. Note, there is still some device code scattered through Tensor.h, and a few other files. These files could be duplicated in the device lib.
This PR additionally adds the 2D transpose and 2D gemm cublas operations, and sets the CMakeLists.txt file up to work correctly.