NervanaSystems / ModelZoo

neon model zoo
Apache License 2.0
176 stars 69 forks source link

Issue loading dataset #7

Open iandewancker opened 8 years ago

iandewancker commented 8 years ago

Hey there I am playing around with the "cifar10_msra.py" example and ran into a snag running the Imageloading

In [15]: train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)
libdc1394 error: Failed to initialize libdc1394
---------------------------------------------------------------------------
ArgumentError                             Traceback (most recent call last)
<ipython-input-15-c033fd957d22> in <module>()
----> 1 train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)

/usr/local/lib/python2.7/dist-packages/neon/data/imageloader.pyc in __init__(self, repo_dir, inner_size, scale_range, do_transforms, rgb, shuffle, set_name, subset_pct, nlabels, macro, contrast_range, aspect_ratio)
    105                                           target_size=1, reshuffle=shuffle,
    106                                           nclasses=self.nclass,
--> 107                                           subset_percent=subset_pct)
    108
    109     def configure(self, repo_dir, set_name, subset_pct):

/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in __init__(self, set_name, repo_dir, media_params, target_size, index_file, shuffle, reshuffle, datum_dtype, target_dtype, onehot, nclasses, subset_percent, ingest_params)
     85         self.ingest_params = ingest_params
     86         self.load_library()
---> 87         self.alloc()
     88         self.start()
     89         atexit.register(self.stop)

/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in alloc(self)
    110             return BufferPair(ct_cast(buffers, 0), ct_cast(buffers, 1))
    111
--> 112         self.data = alloc_bufs(self.datum_size, self.datum_dtype)
    113         self.targets = alloc_bufs(self.target_size, self.target_dtype)
    114         self.device_params = DeviceParams(self.be.device_type,

/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in alloc_bufs(dim0, dtype)
    102
    103         def alloc_bufs(dim0, dtype):
--> 104             return [self.be.iobuf(dim0=dim0, dtype=dtype) for _ in range(2)]
    105
    106         def ct_cast(buffers, idx):

/usr/local/lib/python2.7/dist-packages/neon/backends/backend.pyc in iobuf(self, dim0, x, dtype, name, persist_values, shared, parallelism)
    549
    550         if persist_values and shared is None:
--> 551             out_tsr[:] = 0
    552
    553         return out_tsr

/usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.pyc in __setitem__(self, index, value)
    178     def __setitem__(self, index, value):
    179
--> 180         self.__getitem__(index)._assign(value)
    181
    182     def __getitem__(self, index):

/usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.pyc in _assign(self, value)
    339                 if self.dtype.itemsize == 1:
    340                     drv.memset_d8_async(
--> 341                         self.gpudata, unpack_from('B', value)[0], self.size, stream)
    342                 elif self.dtype.itemsize == 2:
    343                     drv.memset_d16_async(

ArgumentError: Python argument types in
    pycuda._driver.memset_d8_async(NoneType, int, int, NoneType)
did not match C++ signature:
    memset_d8_async(unsigned long long dest, unsigned char data, unsigned int size, pycudaboost::python::api::object stream=None)

Any ideas what I could be doing wrong here?

apark263 commented 8 years ago

did you create image batches for the dataset first?

if not, then you will need to create them first. If you did, then it might be helpful to know the command line arguments you are supplying to the script

On Fri, May 6, 2016 at 11:12 AM, Ian Dewancker notifications@github.com wrote:

Hey there I am playing around with the "cifar10_msra.py" example and ran into a snag running the Imageloading

In [15]: train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)

libdc1394 error: Failed to initialize libdc1394

ArgumentError Traceback (most recent call last)

in () ----> 1 train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options) /usr/local/lib/python2.7/dist-packages/neon/data/imageloader.pyc in **init**(self, repo_dir, inner_size, scale_range, do_transforms, rgb, shuffle, set_name, subset_pct, nlabels, macro, contrast_range, aspect_ratio) 105 target_size=1, reshuffle=shuffle, 106 nclasses=self.nclass, --> 107 subset_percent=subset_pct) 108 109 def configure(self, repo_dir, set_name, subset_pct): /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in **init**(self, set_name, repo_dir, media_params, target_size, index_file, shuffle, reshuffle, datum_dtype, target_dtype, onehot, nclasses, subset_percent, ingest_params) 85 self.ingest_params = ingest_params 86 self.load_library() ---> 87 self.alloc() 88 self.start() 89 atexit.register(self.stop) /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in alloc(self) 110 return BufferPair(ct_cast(buffers, 0), ct_cast(buffers, 1)) 111 --> 112 self.data = alloc_bufs(self.datum_size, self.datum_dtype) 113 self.targets = alloc_bufs(self.target_size, self.target_dtype) 114 self.device_params = DeviceParams(self.be.device_type, /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.pyc in alloc_bufs(dim0, dtype) 102 103 def alloc_bufs(dim0, dtype): --> 104 return [self.be.iobuf(dim0=dim0, dtype=dtype) for _ in range(2)] 105 106 def ct_cast(buffers, idx): /usr/local/lib/python2.7/dist-packages/neon/backends/backend.pyc in iobuf(self, dim0, x, dtype, name, persist_values, shared, parallelism) 549 550 if persist_values and shared is None: --> 551 out_tsr[:] = 0 552 553 return out_tsr /usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.pyc in **setitem**(self, index, value) 178 def **setitem**(self, index, value): 179 --> 180 self.**getitem**(index)._assign(value) 181 182 def __getitem__(self, index): /usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.pyc in _assign(self, value) 339 if self.dtype.itemsize == 1: 340 drv.memset_d8_async( --> 341 self.gpudata, unpack_from('B', value)[0], self.size, stream) 342 elif self.dtype.itemsize == 2: 343 drv.memset_d16_async( ArgumentError: Python argument types in pycuda._driver.memset_d8_async(NoneType, int, int, NoneType) did not match C++ signature: memset_d8_async(unsigned long long dest, unsigned char data, unsigned int size, pycudaboost::python::api::object stream=None) Any ideas what I could be doing wrong here? — You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/NervanaSystems/ModelZoo/issues/7
iandewancker commented 8 years ago

Sure, I ran this command : ./neon/neon/data/batch_writer.py --set_type cifar10 --data_dir "data" --macro_size 10000 --target_size 40

from '/home/ubuntu' dir, where the neon repo is also checked out.

Then in an ipython started from the same location I'm running

from neon.initializers import Kaiming, IdentityInit
from neon.layers import Conv, Pooling, GeneralizedCost, Affine, Activation
from neon.layers import MergeSum, SkipNode
from neon.optimizers import GradientDescentMomentum, Schedule
from neon.transforms import Rectlin, Softmax, CrossEntropyMulti, Misclassification
from neon.models import Model
from neon.data import ImageLoader
from neon.callbacks.callbacks import Callbacks, MetricCallback
from neon.backends import gen_backend
import sigopt.interface
import time

gen_backend(backend='gpu')

# load datasets
DATA_DIR_PATH = "/home/ubuntu/data/"
imgset_options = dict(inner_size=32, scale_range=40, aspect_ratio=110,
                      repo_dir=DATA_DIR_PATH, subset_pct=100)
train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)
apark263 commented 8 years ago

hmm... that is a strange one.

could you try changing line 104 on /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py to instead return

return [self.be.iobuf(dim0=dim0, dtype=dtype, persist_values=False) for _ in range(2)]
iandewancker commented 8 years ago

Hmm maybe got further:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-c033fd957d22> in <module>()
----> 1 train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options)

/usr/local/lib/python2.7/dist-packages/neon/data/imageloader.pyc in __init__(self, repo_dir, inner_size, scale_range, do_transforms, rgb, shuffle, set_name, subset_pct, nlabels, macro, contrast_range, aspect_ratio)
    105                                           target_size=1, reshuffle=shuffle,
    106                                           nclasses=self.nclass,
--> 107                                           subset_percent=subset_pct)
    108
    109     def configure(self, repo_dir, set_name, subset_pct):

/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in __init__(self, set_name, repo_dir, media_params, target_size, index_file, shuffle, reshuffle, datum_dtype, target_dtype, onehot, nclasses, subset_percent, ingest_params)
     85         self.ingest_params = ingest_params
     86         self.load_library()
---> 87         self.alloc()
     88         self.start()
     89         atexit.register(self.stop)

/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in alloc(self)
    115         self.device_params = DeviceParams(self.be.device_type,
    116                                           self.be.device_id,
--> 117                                           cast_bufs(self.data),
    118                                           cast_bufs(self.targets))
    119         if self.onehot:

/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in cast_bufs(buffers)
    109
    110         def cast_bufs(buffers):
--> 111             return BufferPair(ct_cast(buffers, 0), ct_cast(buffers, 1))
    112
    113         self.data = alloc_bufs(self.datum_size, self.datum_dtype)

/usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in ct_cast(buffers, idx)
    106
    107         def ct_cast(buffers, idx):
--> 108             return ct.cast(int(buffers[idx].raw()), ct.c_void_p)
    109
    110         def cast_bufs(buffers):

TypeError: int() argument must be a string or a number, not 'NoneType'
apark263 commented 8 years ago

hmm...

have you been able to run any other neon examples (e.g. cifar_conv.py in the examples directory)? which gpu do you have and which version of pycuda?

thanks,

On Fri, May 6, 2016 at 11:43 AM, Ian Dewancker notifications@github.com wrote:

Hmm maybe got further:


TypeError Traceback (most recent call last)

in () ----> 1 train = ImageLoader(set_name='train', shuffle=True, do_transforms=True, **imgset_options) /usr/local/lib/python2.7/dist-packages/neon/data/imageloader.pyc in **init**(self, repo_dir, inner_size, scale_range, do_transforms, rgb, shuffle, set_name, subset_pct, nlabels, macro, contrast_range, aspect_ratio) 105 target_size=1, reshuffle=shuffle, 106 nclasses=self.nclass, --> 107 subset_percent=subset_pct) 108 109 def configure(self, repo_dir, set_name, subset_pct): /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in **init**(self, set_name, repo_dir, media_params, target_size, index_file, shuffle, reshuffle, datum_dtype, target_dtype, onehot, nclasses, subset_percent, ingest_params) 85 self.ingest_params = ingest_params 86 self.load_library() ---> 87 self.alloc() 88 self.start() 89 atexit.register(self.stop) /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in alloc(self) 115 self.device_params = DeviceParams(self.be.device_type, 116 self.be.device_id, --> 117 cast_bufs(self.data), 118 cast_bufs(self.targets)) 119 if self.onehot: /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in cast_bufs(buffers) 109 110 def cast_bufs(buffers): --> 111 return BufferPair(ct_cast(buffers, 0), ct_cast(buffers, 1)) 112 113 self.data = alloc_bufs(self.datum_size, self.datum_dtype) /usr/local/lib/python2.7/dist-packages/neon/data/dataloader.py in ct_cast(buffers, idx) 106 107 def ct_cast(buffers, idx): --> 108 return ct.cast(int(buffers[idx].raw()), ct.c_void_p) 109 110 def cast_bufs(buffers): TypeError: int() argument must be a string or a number, not 'NoneType' — You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/NervanaSystems/ModelZoo/issues/7#issuecomment-217526091
iandewancker commented 8 years ago

I'm trying to run this on an AWS g2.2xlarge machine, which uses a GK104GL [GRID K520] from NVIDIA. pycuda version looks to be 2016.1 [5]: pycuda.VERSION Out[5]: (2016, 1)

get an error trying the cifar_conv example as well

ubuntu@ip-172-31-46-136:~/neon/examples$ python cifar10_conv.py
2016-05-06 18:59:26,618 - neon.backends.nervanagpu - WARNING - Neon is highly optimized for Maxwell GPUs. Although you might get speedups over CPUs, note that you are running on a pre-Maxwell GPU and you might not experience the fastest performance. For faster performance using the Nervana Cloud contact info@nervanasys.com
Downloading file: /home/ubuntu/nervana/data/cifar-10-python.tar.gz
Download Progress |██████████████████████████████████████████████████| Download Complete
Traceback (most recent call last):
  File "cifar10_conv.py", line 73, in <module>
    mlp.fit(train, optimizer=opt_gdm, num_epochs=num_epochs, cost=cost, callbacks=callbacks)
  File "/usr/local/lib/python2.7/dist-packages/neon/models/model.py", line 149, in fit
    self._epoch_fit(dataset, callbacks)
  File "/usr/local/lib/python2.7/dist-packages/neon/models/model.py", line 179, in _epoch_fit
    self.bprop(delta)
  File "/usr/local/lib/python2.7/dist-packages/neon/models/model.py", line 211, in bprop
    return self.layers.bprop(delta)
  File "/usr/local/lib/python2.7/dist-packages/neon/layers/container.py", line 207, in bprop
    error = l.bprop(error)
  File "/usr/local/lib/python2.7/dist-packages/neon/layers/layer.py", line 654, in bprop
    alpha=alpha, beta=beta)
  File "/usr/local/lib/python2.7/dist-packages/neon/backends/nervanagpu.py", line 1652, in bprop_conv
    layer.bprop_kernels.bind_params(E, F, grad_I, alpha, beta, bsum)
  File "/usr/local/lib/python2.7/dist-packages/neon/backends/convolution.py", line 293, in bind_params
    assert bsum is not None, "must use initialized bsum config"
AssertionError: must use initialized bsum config
iandewancker commented 8 years ago

This was my install script if that is helpful

sudo apt-get update && sudo apt-get -yq upgrade
sudo apt-get install python-dev
sudo apt-get install -y libopencv-dev python-opencv libhdf5-dev
#sudo apt-get install -yq linux-image-extra-`uname -r`
sudo apt-get -y install git

sudo pip install -q --upgrade pip
sudo pip install -U numpy
sudo pip install -U scipy
sudo pip install scikit-learn==0.17 joblib sigopt pystache awscli
sudo pip install --upgrade pillow
sudo apt-get install libjpeg-dev zlib1g-dev

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb
sudo apt-get update
sudo apt-get -yq install cuda

git clone https://github.com/NervanaSystems/neon.git
cd neon && sudo make sysinstall
sudo ln -sf /usr/local/cuda-7.5/bin/nvcc /usr/bin/nvcc
export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-7.5/bin:$PATH
apark263 commented 8 years ago

ah ok -- it's a non-maxwell card. i guess there are still some issues for running dataloader dependent examples (cifar_msra) on kepler cards. Seems like the device buffer for storing data and targets is not getting allocated as it should We will take a look at those.

in the meantime, the bsum AssertionError on the cifar_conv example can be fixed by supplying -r 0 on the command line

On Fri, May 6, 2016 at 12:05 PM, Ian Dewancker notifications@github.com wrote:

This was my install script if that is helpful

sudo apt-get update && sudo apt-get -yq upgrade sudo apt-get install python-dev sudo apt-get install -y libopencv-dev python-opencv libhdf5-dev

sudo apt-get install -yq linux-image-extra-uname -r

sudo apt-get -y install git

sudo pip install -q --upgrade pip sudo pip install -U numpy sudo pip install -U scipy sudo pip install scikit-learn==0.17 joblib sigopt pystache awscli sudo pip install --upgrade pillow sudo apt-get install libjpeg-dev zlib1g-dev

wget http://developer.download.nvidia.com/compute/cuda/7.5/Prod/local_installers/cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb sudo dpkg -i cuda-repo-ubuntu1404-7-5-local_7.5-18_amd64.deb sudo apt-get update sudo apt-get -yq install cuda

git clone https://github.com/NervanaSystems/neon.git cd neon && sudo make sysinstall sudo ln -sf /usr/local/cuda-7.5/bin/nvcc /usr/bin/nvcc export LD_LIBRARY_PATH=/usr/local/cuda-7.5/lib64:$LD_LIBRARY_PATH export PATH=/usr/local/cuda-7.5/bin:$PATH

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/NervanaSystems/ModelZoo/issues/7#issuecomment-217531621

iandewancker commented 8 years ago

Thanks for the help! Any chance an earlier version of neon might work better with the Kepler GPUs?