Open iandewancker opened 8 years ago
did you create image batches for the dataset first?
if not, then you will need to create them first. If you did, then it might be helpful to know the command line arguments you are supplying to the script
On Fri, May 6, 2016 at 11:12 AM, Ian Dewancker notifications@github.com wrote:
Hey there I am playing around with the "cifar10_msra.py" example and ran into a snag running the Imageloading
In [15]: train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)
libdc1394 error: Failed to initialize libdc1394
ArgumentError Traceback (most recent call last)
in () ----> 1 train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options) /usr/local/lib/python2.7/dist-packages/neon/data/imageloader.pyc in **init**(self, repo_dir, inner_size, scale_range, do_transforms, rgb, shuffle, set_name, subset_pct, nlabels, macro, contrast_range, aspect_ratio) 105 target_size=1, reshuffle=shuffle, 106 nclasses=self.nclass, --> 107 subset_percent=subset_pct) 108 109 def configure(self, repo_dir, set_name, subset_pct): /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in **init**(self, set_name, repo_dir, media_params, target_size, index_file, shuffle, reshuffle, datum_dtype, target_dtype, onehot, nclasses, subset_percent, ingest_params) 85 self.ingest_params = ingest_params 86 self.load_library() ---> 87 self.alloc() 88 self.start() 89 atexit.register(self.stop) /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in alloc(self) 110 return BufferPair(ct_cast(buffers, 0), ct_cast(buffers, 1)) 111 --> 112 self.data = alloc_bufs(self.datum_size, self.datum_dtype) 113 self.targets = alloc_bufs(self.target_size, self.target_dtype) 114 self.device_params = DeviceParams(self.be.device_type, /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in alloc_bufs(dim0, dtype) 102 103 def alloc_bufs(dim0, dtype): --> 104 return [self.be.iobuf(dim0=dim0, dtype=dtype) for _ in range(2)] 105 106 def ct_cast(buffers, idx): /usr/local/lib/python2.7/dist-packages/neon/backends/backend.pyc in iobuf(self, dim0, x, dtype, name, persist_values, shared, parallelism) 549 550 if persist_values and shared is None: --> 551 out_tsr[:] = 0 552 553 return out_tsr /usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.pyc in **setitem**(self, index, value) 178 def **setitem**(self, index, value): 179 --> 180 self.**getitem**(index)._assign(value) 181 182 def __getitem__(self, index): /usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.pyc in _assign(self, value) 339 if self.dtype.itemsize == 1: 340 drv.memset_d8_async( --> 341 self.gpudata, unpack_from('B', value)[0], self.size, stream) 342 elif self.dtype.itemsize == 2: 343 drv.memset_d16_async( ArgumentError: Python argument types in pycuda._driver.memset_d8_async(NoneType, int, int, NoneType) did not match C++ signature: memset_d8_async(unsigned long long dest, unsigned char data, unsigned int size, pycudaboost::python::api::object stream=None) Any ideas what I could be doing wrong here? — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/NervanaSystems/ModelZoo/issues/7
Sure, I ran this command :
./neon/neon/data/batch_writer.py --set_type cifar10 --data_dir "data" --macro_size 10000 --target_size 40
from '/home/ubuntu' dir, where the neon repo is also checked out.
Then in an ipython started from the same location I'm running
from neon.initializers import Kaiming, IdentityInit
from neon.layers import Conv, Pooling, GeneralizedCost, Affine, Activation
from neon.layers import MergeSum, SkipNode
from neon.optimizers import GradientDescentMomentum, Schedule
from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification
from neon.models import Model
from neon.data import ImageLoader
from neon.callbacks.callbacks import Callbacks, MetricCallback
from neon.backends import gen_backend
import sigopt.interface
import time
gen_backend(backend='gpu')
# load datasets
DATA_DIR_PATH = "/home/ubuntu/data/"
imgset_options = dict(inner_size=32, scale_range=40, aspect_ratio=110,
repo_dir=DATA_DIR_PATH, subset_pct=100)
train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)
hmm... that is a strange one.
could you try changing line 104 on /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py
to instead return
return [self.be.iobuf(dim0=dim0, dtype=dtype, persist_values=False) for _ in range(2)]
Hmm maybe got further:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-c033fd957d22> in <module>()
----> 1 train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)
/usr/local/lib/python2.7/dist-packages/neon/data/imageloader.pyc in __init__(self, repo_dir, inner_size, scale_range, do_transforms, rgb, shuffle, set_name, subset_pct, nlabels, macro, contrast_range, aspect_ratio)
105 target_size=1, reshuffle=shuffle,
106 nclasses=self.nclass,
--> 107 subset_percent=subset_pct)
108
109 def configure(self, repo_dir, set_name, subset_pct):
/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in __init__(self, set_name, repo_dir, media_params, target_size, index_file, shuffle, reshuffle, datum_dtype, target_dtype, onehot, nclasses, subset_percent, ingest_params)
85 self.ingest_params = ingest_params
86 self.load_library()
---> 87 self.alloc()
88 self.start()
89 atexit.register(self.stop)
/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in alloc(self)
115 self.device_params = DeviceParams(self.be.device_type,
116 self.be.device_id,
--> 117 cast_bufs(self.data),
118 cast_bufs(self.targets))
119 if self.onehot:
/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in cast_bufs(buffers)
109
110 def cast_bufs(buffers):
--> 111 return BufferPair(ct_cast(buffers, 0), ct_cast(buffers, 1))
112
113 self.data = alloc_bufs(self.datum_size, self.datum_dtype)
/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in ct_cast(buffers, idx)
106
107 def ct_cast(buffers, idx):
--> 108 return ct.cast(int(buffers[idx].raw()), ct.c_void_p)
109
110 def cast_bufs(buffers):
TypeError: int() argument must be a string or a number, not 'NoneType'
hmm...
have you been able to run any other neon examples (e.g. cifar_conv.py in the examples directory)? which gpu do you have and which version of pycuda?
thanks,
On Fri, May 6, 2016 at 11:43 AM, Ian Dewancker notifications@github.com wrote:
Hmm maybe got further:
TypeError Traceback (most recent call last)
in () ----> 1 train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options) /usr/local/lib/python2.7/dist-packages/neon/data/imageloader.pyc in **init**(self, repo_dir, inner_size, scale_range, do_transforms, rgb, shuffle, set_name, subset_pct, nlabels, macro, contrast_range, aspect_ratio) 105 target_size=1, reshuffle=shuffle, 106 nclasses=self.nclass, --> 107 subset_percent=subset_pct) 108 109 def configure(self, repo_dir, set_name, subset_pct): /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in **init**(self, set_name, repo_dir, media_params, target_size, index_file, shuffle, reshuffle, datum_dtype, target_dtype, onehot, nclasses, subset_percent, ingest_params) 85 self.ingest_params = ingest_params 86 self.load_library() ---> 87 self.alloc() 88 self.start() 89 atexit.register(self.stop) /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in alloc(self) 115 self.device_params = DeviceParams(self.be.device_type, 116 self.be.device_id, --> 117 cast_bufs(self.data), 118 cast_bufs(self.targets)) 119 if self.onehot: /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in cast_bufs(buffers) 109 110 def cast_bufs(buffers): --> 111 return BufferPair(ct_cast(buffers, 0), ct_cast(buffers, 1)) 112 113 self.data = alloc_bufs(self.datum_size, self.datum_dtype) /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in ct_cast(buffers, idx) 106 107 def ct_cast(buffers, idx): --> 108 return ct.cast(int(buffers[idx].raw()), ct.c_void_p) 109 110 def cast_bufs(buffers): TypeError: int() argument must be a string or a number, not 'NoneType' — You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/NervanaSystems/ModelZoo/issues/7#issuecomment-217526091
I'm trying to run this on an AWS g2.2xlarge machine, which uses a GK104GL [GRID K520] from NVIDIA. pycuda version looks to be 2016.1 [5]: pycuda.VERSION Out[5]: (2016, 1)
get an error trying the cifar_conv example as well
ubuntu@ip-172-31-46-136:~/neon/examples$ python cifar10_conv.py
2016-05-06 18:59:26,618 - neon.backends.nervanagpu - WARNING - Neon is highly optimized for Maxwell GPUs. Although you might get speedups over CPUs, note that you are running on a pre-Maxwell GPU and you might not experience the fastest performance. For faster performance using the Nervana Cloud contact info@nervanasys.com
Downloading file: /home/ubuntu/nervana/data/cifar-10-python.tar.gz
Download Progress |██████████████████████████████████████████████████| Download Complete
Traceback (most recent call last):
File "cifar10_conv.py", line 73, in <module>
mlp.fit(train, optimizer=opt_gdm, num_epochs=num_epochs, cost=cost, callbacks=callbacks)
File "/usr/local/lib/python2.7/dist-packages/neon/models/model.py", line 149, in fit
self._epoch_fit(dataset, callbacks)
File "/usr/local/lib/python2.7/dist-packages/neon/models/model.py", line 179, in _epoch_fit
self.bprop(delta)
File "/usr/local/lib/python2.7/dist-packages/neon/models/model.py", line 211, in bprop
return self.layers.bprop(delta)
File "/usr/local/lib/python2.7/dist-packages/neon/layers/container.py", line 207, in bprop
error = l.bprop(error)
File "/usr/local/lib/python2.7/dist-packages/neon/layers/layer.py", line 654, in bprop
alpha=alpha, beta=beta)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.py", line 1652, in bprop_conv
layer.bprop_kernels.bind_params(E, F, grad_I, alpha, beta, bsum)
File "/usr/local/lib/python2.7/dist-packages/neon/backends/convolution.py", line 293, in bind_params
assert bsum is not None, "must use initialized bsum config"
AssertionError: must use initialized bsum config
This was my install script if that is helpful
sudo apt-get update && sudo apt-get -yq upgrade
sudo apt-get install python-dev
sudo apt-get install -y libopencv-dev python-opencv libhdf5-dev
#sudo apt-get install -yq linux-image-extra-`uname -r`
sudo apt-get -y install git
sudo pip install -q --upgrade pip
sudo pip install -U numpy
sudo pip install -U scipy
sudo pip install scikit-learn==0.17 joblib sigopt pystache awscli
sudo pip install --upgrade pillow
sudo apt-get install libjpeg-dev zlib1g-dev
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get -yq install cuda
git clone https://github.com/NervanaSystems/neon.git
cd neon && sudo make sysinstall
sudo ln -sf /usr/local/cuda-7.5/bin/nvcc /usr/bin/nvcc
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-7.5/bin:$PATH
ah ok -- it's a non-maxwell card. i guess there are still some issues for running dataloader dependent examples (cifar_msra) on kepler cards. Seems like the device buffer for storing data and targets is not getting allocated as it should We will take a look at those.
in the meantime, the bsum AssertionError on the cifar_conv example can be
fixed by supplying -r 0
on the command line
On Fri, May 6, 2016 at 12:05 PM, Ian Dewancker notifications@github.com wrote:
This was my install script if that is helpful
sudo apt-get update && sudo apt-get -yq upgrade sudo apt-get install python-dev sudo apt-get install -y libopencv-dev python-opencv libhdf5-dev
sudo apt-get install -yq linux-image-extra-
uname -r
sudo apt-get -y install git
sudo pip install -q --upgrade pip sudo pip install -U numpy sudo pip install -U scipy sudo pip install scikit-learn==0.17 joblib sigopt pystache awscli sudo pip install --upgrade pillow sudo apt-get install libjpeg-dev zlib1g-dev
wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb sudo apt-get update sudo apt-get -yq install cuda
git clone https://github.com/NervanaSystems/neon.git cd neon && sudo make sysinstall sudo ln -sf /usr/local/cuda-7.5/bin/nvcc /usr/bin/nvcc export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH export PATH=/usr/local/cuda-7.5/bin:$PATH
— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/NervanaSystems/ModelZoo/issues/7#issuecomment-217531621
Thanks for the help! Any chance an earlier version of neon might work better with the Kepler GPUs?
Hey there I am playing around with the "cifar10_msra.py" example and ran into a snag running the Imageloading
Any ideas what I could be doing wrong here?