This pr is the first version (VERY naive) of a working
im2col on the device.
This version uses 1 thread to do the copy for each
element in a filter.
Note that the purpose of the im2col is to spread the
filters out into a new flattened 2D array. The easiest
way we can do this is to map the outer 4 dimensions
to a 1D so that we can use grid stride loops to find
targets and source indexes. However, this version
is very slow as the dimensions reduce and the number
of filters reduces.
This pr is the first version (VERY naive) of a working im2col on the device.
This version uses 1 thread to do the copy for each element in a filter.
Note that the purpose of the im2col is to spread the filters out into a new flattened 2D array. The easiest way we can do this is to map the outer 4 dimensions to a 1D so that we can use grid stride loops to find targets and source indexes. However, this version is very slow as the dimensions reduce and the number of filters reduces.