mnist_lstm example fails

flukeskywalker commented 9 years ago

Using PyCudaHandler, I get:

[Traceback (most recent call last):
  File "mnist_lstm.py", line 85, in <module>
    trainer.train(network, train_getter, valid_getter=valid_getter)
  File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 50, in train
  File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 126, in _emit_monitoring
  File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 136, in _call_monitor
  File "build/bdist.linux-x86_64/egg/brainstorm/training/monitors.py", line 289, in __call__
  File "build/bdist.linux-x86_64/egg/brainstorm/training/trainer.py", line 169, in run_network
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/network.py", line 62, in provide_external_data
  File "build/bdist.linux-x86_64/egg/brainstorm/handlers/pycuda_handler.py", line 49, in copy_to
pycuda._driver.LogicError: cuMemcpyDtoD failed: invalid argument in <brainstorm.training.monitors.MonitorAccuracy object at 0x7fea2cf61050>

@untom, perhaps you have an idea about what might be causing this?

untom commented 9 years ago

Not at all... looks weird. are you certain both buffers were large enough? Other than that, no idea. If you can give me the code you ran, I can try to debug it from here.

flukeskywalker commented 9 years ago

Great! It's examples/mnist_lstm.py

untom commented 9 years ago

I haven't looked at that part of the code that deals with buffers (or monitors) before, so I don't fully understand everything that's going on. But here's where the problem lies:

the call that crashes is

 drv.memcpy_dtod(dest.gpudata, src.gpudata, dest.nbytes)

Note that you try to copy as many bytes as the destination has. However, when the call fails, dest is much, much bigger than src:

ipdb> print(src.shape)
(784, 10, 1)
ipdb> print(dest.shape)
(784, 10, 784)

Now why that is the way it is, I don't know, because I don't understand what provide_external_data is meant to do.

flukeskywalker commented 9 years ago

Ah, okay I located the problem, it was in the example itself. To summarize, provide_external_data provides data from the iterator to the network's input layer's output buffer. Thereafter, all layers receiving input from the input layer have the necessary data in the input buffers.

The problem was that I forgot to set the feature dimension to 1 instead of 784 when I set up the input layer. The data coming in, however, had just 1 feature dimension. The code worked fine with NumpyHandler though :( This made me think it's something to do with PyCuda.

Thanks for the help though! I will improve the examples further in the coming days.

untom commented 9 years ago

Okay, thanks for the explanation :)

flukeskywalker commented 9 years ago

Reopening it due to next problem. This one is best handled by @Qwlouse. Running examples/mnist_lstm.py gives me

Traceback (most recent call last):
  File "mnist_lstm.py", line 72, in <module>
    network = bs.build_net(inp_layer - 'targets' >> 'targets' - out_layer)
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/network.py", line 24, in build_net
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/network.py", line 30, in build_network_from_architecture
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 65, in __init__
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 103, in resize
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 42, in create_buffer_views_from_layout
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 42, in create_buffer_views_from_layout
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 42, in create_buffer_views_from_layout
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 33, in create_buffer_views_from_layout
ValueError: total size of new array must be unchanged

Apparently, the buffer reshaping is incorrect if buffer_type is 2. It seems it might be a problem with context slices, since the error only appears when using RnnLayer or LstmLayer. Additionally, it'd be nice to have a better error message here.

Qwlouse commented 9 years ago

Next error I get is this. Don't know yet why that is:

Traceback (most recent call last):
  File "/home/greff/Programming/brainstorm/examples/mnist_lstm.py", line 91, in <module>
    trainer.train(network, train_getter, valid_getter=valid_getter)
  File "/home/greff/venv/py3/lib/python3.4/site-packages/brainstorm-0.1.0-py3.4.egg/brainstorm/training/trainer.py", line 50, in train
  File "/home/greff/venv/py3/lib/python3.4/site-packages/brainstorm-0.1.0-py3.4.egg/brainstorm/training/trainer.py", line 126, in _emit_monitoring
  File "/home/greff/venv/py3/lib/python3.4/site-packages/brainstorm-0.1.0-py3.4.egg/brainstorm/training/trainer.py", line 136, in _call_monitor
  File "/home/greff/venv/py3/lib/python3.4/site-packages/brainstorm-0.1.0-py3.4.egg/brainstorm/training/monitors.py", line 290, in __call__
  File "/home/greff/venv/py3/lib/python3.4/site-packages/brainstorm-0.1.0-py3.4.egg/brainstorm/structure/network.py", line 73, in forward_pass
  File "/home/greff/venv/py3/lib/python3.4/site-packages/brainstorm-0.1.0-py3.4.egg/brainstorm/layers/lstm_layer.py", line 98, in forward_pass
  File "/home/greff/venv/py3/lib/python3.4/site-packages/brainstorm-0.1.0-py3.4.egg/brainstorm/handlers/pycuda_handler.py", line 170, in tanh
  File "/home/greff/venv/py3/lib/python3.4/site-packages/pycuda/cumath.py", line 32, in f
    raise RuntimeError("only contiguous arrays may "
RuntimeError: only contiguous arrays may be used as arguments to this operation in <brainstorm.training.monitors.MonitorAccuracy object at 0x7f1272c42e48>

flukeskywalker commented 9 years ago

Okay, I fixed that by switching to an elementwise kernel. Now, the error is (which probably affects all recurrent networks):

Traceback (most recent call last):
  File "examples/mnist_lstm.py", line 73, in <module>
    network.set_memory_handler(PyCudaHandler())
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/network.py", line 258, in set_memory_handler
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 128, in set_memory_handler
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 106, in resize
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 42, in create_buffer_views_from_layout
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 42, in create_buffer_views_from_layout
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 42, in create_buffer_views_from_layout
  File "build/bdist.linux-x86_64/egg/brainstorm/structure/buffers.py", line 33, in create_buffer_views_from_layout
  File "/home/arkade/venv/py2/local/lib/python2.7/site-packages/pycuda/gpuarray.py", line 681, in reshape
    raise RuntimeError("only contiguous arrays may "
RuntimeError: only contiguous arrays may be used as arguments to this operation

It seems that you cannot slice and then reshape a GPUArray? :( CC: @untom

untom commented 9 years ago

Slicing and reshaping should be possible, as long as you slice along the 0-th axis. I'll investigate

untom commented 9 years ago

Slightly off-topic, but since I've stumbled over it during a stacktrace: what are hubs, and what does hub.btype store? (Also, are you guys in any IRC, ICQ or Jabber channel for short questions like this?)

untom commented 9 years ago

FWIW, brainstorm.structure.create_buffer_views_from_layout slices arrays in a non-contiguous manner (i.e., it slices more than just the outer-most axis):

    else:  # buffer_type == 2
        full_buffer = buffers[buffer_nr][t_slice, :, start:stop]

Which I assume is the reason for this, but this is again a piece of code whose purpose is beyond me ;)

Qwlouse commented 9 years ago

When allocating the memory, we first chunk it up into hubs. Each hubs represents a memory region to which some layer writes and from which some (other) layer reads (when represents a connection). There is one hub for all parameters, for each connection between layers there is one, and for each internal buffer.

The btype just says how that hub scales with time and batch-size. So btype=0 is constant (parameters), btype=1 only scales with batch-size and btype=2 scales with time and batch-size.

If your architecture doesn't merge layers (i.e. a >> c and b >> c) then in that code snippet you found start:stop should always be the full last axis. Which BTW means, that you can only use merging of layers for CPU.

(No IRC, ICQ or Jabber yet, but I'd be willing to. We could also set up a Gitter for brainstorm. Although their support for private repos requires write access ಠ_ಠ)

untom commented 9 years ago

That's a quaint requirement :-D

Thanks for the explanation. However, then I don't understand the following. I've added a little print-statement to the code I showed above:

    else:  # buffer_type == 2
        print(buffers[buffer_nr].shape, start, stop)
        full_buffer = buffers[buffer_nr][t_slice, :, start:stop]

And the output I got when running mnist_lstm.py was:

....
(785, 10, 1000) 0 100
(785, 10, 1000) 100 200
(785, 10, 1000) 200 300
(785, 10, 1000) 300 400
(785, 10, 1000) 400 500
(785, 10, 1000) 500 600
(785, 10, 1000) 600 700
(785, 10, 1000) 700 800
(785, 10, 1000) 800 900
(785, 10, 1000) 900 1000
....

Which to me looks like there is some slicing of the last axis happening, since start & stop are clearly aren't the full last axis.

flukeskywalker commented 9 years ago

(I'll move the discussion about communication medium to email)

Qwlouse commented 9 years ago

That's a quaint requirement :-D

I know :-). I didn't have the heart to remove all the fancy layer-merging-magic from brainstorm when we discovered that slicing is restricted for pycuda. So I only changed and partially removed it. The official way to do layer merges for GPU is now to use a MergeLayer with two named inputs (not implemented yet). But you can still use the old (magic) way when you are working on CPU only...

I've added a little print-statement to the code I showed above:

Ohh, I found the problem, thanks. I'll fix it after lunch.

untom commented 9 years ago

The quaint requirement was meant for Gitter ;-)

Qwlouse commented 9 years ago

Ok I fixed that issue and another one in the handler, and now it seems to work. It is a bit sluggish though. I didn't run it with the full set, but I estimate it to be around 8h per epoch on my machine. :(

flukeskywalker commented 9 years ago

How does this work for you now? I get an error (with mnist_lstm.py) because cumisc.sum() returns a numpy.ndarray, so copy_to cannot copy it (no gpudata) :( This was a huge problem earlier when I was trying to fix the mnist example: "arrays" of just one element.

On 18 August 2015 at 14:55, Klaus Greff notifications@github.com wrote:

Ok I fixed that issue and another one in the handler, and now it seems to work. It is a bit sluggish though. I didn't run it with the full set, but I estimate it to be around 8h per epoch on my machine. :(

— Reply to this email directly or view it on GitHub https://github.com/Qwlouse/brainstorm/issues/27#issuecomment-132197592.

untom commented 9 years ago

try updating your skcuda version

flukeskywalker commented 9 years ago

That was it. Forgot to update from git instead of pip.

On 18 August 2015 at 15:10, untom notifications@github.com wrote:

try updating your skcuda version

— Reply to this email directly or view it on GitHub https://github.com/Qwlouse/brainstorm/issues/27#issuecomment-132200593.

untom commented 9 years ago

The current pip version should work, too (I think)

flukeskywalker commented 9 years ago

It's taking about 25 mins per epoch on my GTX 980 with a batch size of 100

On 18 August 2015 at 15:15, untom notifications@github.com wrote:

The current pip version should work, too (I think)

— Reply to this email directly or view it on GitHub https://github.com/Qwlouse/brainstorm/issues/27#issuecomment-132201568.

flukeskywalker commented 9 years ago

@untom Unfortunately I was not testing mult_mv with column vectors earlier. But now that loss was zero when using a mask layer (which uses mult_mv), I added the test for it and it fails :(

untom commented 9 years ago

In the tests you provided, the error only occurs on square matrices. For those, we have hardcoded that the addition should always happen on axis 1 (because there was a bug in skcuda). This was however fixed in recent versions of skcuda. Once I remove our work-around, all tests seem to pass.

(Tested with skcuda 0.5.0)

flukeskywalker commented 9 years ago

Ah, this just highlights the needs for more extensive testing. The mask multiplication was still incorrect so I added a couple of shapes for testing, and the op fails when a and b are both column vectors. I just committed the additional tests.

flukeskywalker commented 9 years ago

Progress! We just fixed a nasty bug in LSTM Layer which didn't let training work on GPU. Now I'm investigating where infs and nans crop up sometimes during training.

untom commented 9 years ago

Awesome!

flukeskywalker commented 9 years ago

Looks like simple gradient explosion, so we can probably close this issue finally (fingers crossed).

IDSIA / brainstorm

mnist_lstm example fails #27