fengggli / gpu-computing-materials

A simple deep learning framework that optimizes task scheduling and memory usage on different CPU/GPU architectures.
1 stars 0 forks source link

Im2col inner dev #35

Closed zkSNARK closed 5 years ago

zkSNARK commented 5 years ago

This pr is the first version (VERY naive) of a working im2col on the device.

This version uses 1 thread to do the copy for each element in a filter.

Note that the purpose of the im2col is to spread the filters out into a new flattened 2D array. The easiest way we can do this is to map the outer 4 dimensions to a 1D so that we can use grid stride loops to find targets and source indexes. However, this version is very slow as the dimensions reduce and the number of filters reduces.

zkSNARK commented 5 years ago

No review is really necessary here, but I just want you to see this version.