Dtype issues with gpu backend

Hello, I was experimenting with Neon and had faced an issue with the convolutional and pooling layers. The task was image classification, so the input data shape was (3, H, W). If an ArrayIterator or HDF5Iterator are used as datasets, then the input shape values might have numpy datatypes like numpy.int64 (for ArrayIterator it is provided by lshape parameter, for HDF5Iterator they are retrieved from file['input'].attrs['lshape']). When these values are passed to the model configure method as in_obj, they are assigned to the layer.in_shape. After this, in_shape is used to initialize layer parameters. Next, during the forward pass, the following errors arise:

conv layer:

File "<user>/neon/backends/nervanagpu.py", line 1990, in fprop_conv
return self._execute_conv("fprop", layer, layer.fprop_kernels, repeat)
File "<user>/neon/backends/nervanagpu.py", line 2072, in _execute_conv
kernels.execute(repeat)
File "<user>/neon/backends/convolution.py", line 224, in execute
kernel.prepared_async_call(*self.launch_args, shared_size=self.shared)
File "<user>/pycuda-2017.1.1-py3.5-linux-x86_64.egg/pycuda/driver.py", line 516, in function_prepared_async_call
func._launch_kernel(grid, block, arg_buf, shared_size, stream)
TypeError: No registered converter was able to produce a C++ rvalue of type unsigned int from this Python object of type numpy.int64

pool layer:

File "<user>/neon/backends/nervanagpu.py", line 2316, in fprop_pool
layer.fprop_lut_size, repeat)
File "<user>/neon/backends/nervanagpu.py", line 2349, in _execute_pool
kernel.prepared_async_call(*params, shared_size=shared)
File "<user>/pycuda-2017.1.1-py3.5-linux-x86_64.egg/pycuda/driver.py", line 516, in function_prepared_async_call
func._launch_kernel(grid, block, arg_buf, shared_size, stream)
TypeError: No registered converter was able to produce a C++ rvalue of type unsigned int from this Python object of type numpy.int64

memory allocation in conv:

File "<user>/neon/backends/convolution.py", line 1307, in bind_params
input_data = self.lib.scratch_buffer_offset(self.size)
File "<user>/neon/backends/nervanagpu.py", line 875, in scratch_buffer_offset
data = int(_get_scratch_data(self.scratch_size)) + self.scratch_offset
File "<decorator-gen-62>", line 2, in _get_scratch_data
File "<user>/pycuda-2017.1.1-py3.5-linux-x86_64.egg/pycuda/tools.py", line 430, in context_dependent_memoize
result = func(*args)
File "<user>/neon/backends/nervanagpu.py", line 3287, in _get_scratch_data
return drv.mem_alloc(scratch_size)
Boost.Python.ArgumentError: Python argument types in
pycuda._driver.mem_alloc(numpy.int64)
did not match C++ signature:
mem_alloc(unsigned long)

Layer parameters:

In "<>/neon/backends/convolution.py", line 75, in __init__:
(N, C, K, D, H, W, T, R, S, M, P, Q, pad_d, pad_h, pad_w, str_d, str_h, str_w, dil_d, dil_h, dil_w)

Have following values (idx, type, value):

[(0, <class 'int'>, 128), (1, <class 'numpy.int64'>, 3), (2, <class 'int'>, 32), (3, <class 'int'>, 1), (4, <class 'numpy.int64'>, 128), (5, <class 'numpy.int64'>, 128), (6, <class 'int'>, 1), (7, <class 'int'>, 3), (8, <class 'int'>, 3), (9, <class 'int'>, 1), (10, <class 'numpy.int64'>, 128), (11, <class 'numpy.int64'>, 128), (12, <class 'int'>, 0), (13, <class 'int'>, 2), (14, <class 'int'>, 2), (15, <class 'int'>, 1), (16, <class 'int'>, 1), (17, <class 'int'>, 1), (18, <class 'int'>, 1), (19, <class 'int'>, 2), (20, <class 'int'>, 2)]

Casting all parameters to int in layer initialization fixes the issue for me, but it seems not like a proper solution. Casting elements of lshape to int also helps. I think it would be great if the input values be checked or be converted to the expected types on the library side. Other layer types (like linear, batchnorm, recurrent, etc.) and backends (cpu, mkl) which I had used, had not shown to suffer from this issue.

Environment: python 3.5.2, neon 2.6.0 (f9d771bbb5f5fa3ae129748596d0ced5389c7f88), cuda 8.0, gpu K40s, ubuntu 16.04, boost 1.58.0, pycuda 2017.1.1, numpy 1.13.1.

NervanaSystems / neon

Dtype issues with gpu backend #449