Closed xchoo closed 11 years ago
done and done, I think I had just thrown that in there earlier at some desperate point of trying to debug.
Righo. I'm going to do some proper benchmarking tests between the CPU and GPU implementations. The preliminary tests don't look good though. The GPU implementation of theano seems to run about twice as slow as the CPU implementation. Although, bigger models or longer run times might see a benefit in the GPU implementation.
I don't know exactly what you're running as a benchmark, but I would say, probably don't worry too much if the initial results are disappointing. There a lot of things that affect speed, and there may well be a few not-too-difficult but important changes to make in Theano to get good performance for this type of computation.
Part of the story is the NEF itself - the dot products that produce the semantic vectors are not necessarily a good use of a GPU, especially the way Theano currently represents them. It might be better to merge encoders and decoders into full weight matrices, unless we can organize the computation to involve the computation of a few thousand semantic vector components in parallel, in which case the NEF becomes a big win again. We can talk about these things in the context of a particular model to get a sense of what the bottlenecks are and how to get past them.
On Thu, Mar 21, 2013 at 10:49 PM, xchoo notifications@github.com wrote:
Righo. I'm going to do some proper benchmarking tests between the CPU and GPU implementations. The preliminary tests don't look good though. The GPU implementation of theano seems to run about twice as slow as the CPU implementation. Although, bigger models or longer run times might see a benefit in the GPU implementation.
— Reply to this email directly or view it on GitHubhttps://github.com/ctn-waterloo/nef-py/issues/16#issuecomment-15278461 .
Yup. Sounds like a plan. Im currently just running the test code in the nef-py directory, but those simulations are 1 - 2 seconds long. I figure as the simulation time gets longer, the GPU might win over the CPU. And as you mentioned, retweaking the NEF implementation to work with theano and the GPU should give us better performance.
Xuan Choo HP: (226) 339 3892
On Thu, Mar 21, 2013 at 11:02 PM, James Bergstra notifications@github.comwrote:
I don't know exactly what you're running as a benchmark, but I would say, probably don't worry too much if the initial results are disappointing. There a lot of things that affect speed, and there may well be a few not-too-difficult but important changes to make in Theano to get good performance for this type of computation.
Part of the story is the NEF itself - the dot products that produce the semantic vectors are not necessarily a good use of a GPU, especially the way Theano currently represents them. It might be better to merge encoders and decoders into full weight matrices, unless we can organize the computation to involve the computation of a few thousand semantic vector components in parallel, in which case the NEF becomes a big win again. We can talk about these things in the context of a particular model to get a sense of what the bottlenecks are and how to get past them.
On Thu, Mar 21, 2013 at 10:49 PM, xchoo notifications@github.com wrote:
Righo. I'm going to do some proper benchmarking tests between the CPU and GPU implementations. The preliminary tests don't look good though. The GPU implementation of theano seems to run about twice as slow as the CPU implementation. Although, bigger models or longer run times might see a benefit in the GPU implementation.
— Reply to this email directly or view it on GitHub< https://github.com/ctn-waterloo/nef-py/issues/16#issuecomment-15278461> .
— Reply to this email directly or view it on GitHubhttps://github.com/ctn-waterloo/nef-py/issues/16#issuecomment-15278778 .
I don't think it's going to be simulation time so much as simulation size. The GPU excels at doing tons of computations in parallel. So if we can have all our neurons do their updates each time step as part of one huge op, that's where the GPU will win.
The following error is thrown when trying to use the GPU (CUDA) implementation of theano.
Using gpu device 0: GeForce GTX 280 starting simulation Traceback (most recent call last): File "c:\Program_Files\Python27\lib\runpy.py", line 162, in _run_module_as_mai n "main", fname, loader, pkg_name) File "c:\Program_Files\Python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "d:\fchoo\Documents\GitHub\nef-py\nef\test\test_array.py", line 43, in <m odule> net.run(timesteps*dt_step) File "nef\nef_theano\network.py", line 496, in run self.theano_tick = self.make_theano_tick() File "nef\nef_theano\network.py", line 482, in make_theano_tick return theano.function([], [], updates=updates) File "c:\Program_Files\Python27\lib\site-packages\theano\compile\function.py", line 221, in function profile=profile) File "c:\Program_Files\Python27\lib\site-packages\theano\compile\pfunc.py", li ne 484, in pfunc no_default_updates=no_default_updates) File "c:\Program_Files\Python27\lib\site-packages\theano\compile\pfunc.py", li ne 202, in rebuild_collect_shared update_val = store_into.type.filter_variable(update_val) File "c:\Program_Files\Python27\lib\site-packages\theano\sandbox\cuda\type.py" , line 147, in filter_variable return theano.sandbox.cuda.basic_ops.GpuFromHost()(other) File "c:\Program_Files\Python27\lib\site-packages\theano\gof\op.py", line 401, in call raise ValueError('Cannot compute test value: input %i (%s) of Op %s missing default value' % (i, ins, node)) ValueError: Cannot compute test value: input 0 (Elemwise{Cast{float32}}.0) of Op GpuFromHost(Elemwise{Cast{float32}}.0) missing default value
This error is only thrown when using the GPU option (i.e. device = gpu in .theanorc file).
The cause of this error is the "theano.config.compute_test_value = 'raise'" line of code in network.py. If this line of is not needed, it should be removed.