fengggli / gpu-computing-materials

A simple deep learning framework that optimizes task scheduling and memory usage on different CPU/GPU architectures.
1 stars 0 forks source link

Tmp broken cublas test #43

Closed zkSNARK closed 5 years ago

zkSNARK commented 5 years ago

This refactors a number of files to move (most) of the gpu related code into the awnndevicelib. Note, there is still some device code scattered through Tensor.h, and a few other files. These files could be duplicated in the device lib.

This PR additionally adds the 2D transpose and 2D gemm cublas operations, and sets the CMakeLists.txt file up to work correctly.