chrischoy / 3D-R2N2

Single/multi view image(s) to voxel reconstruction using a recurrent neural network
MIT License
1.34k stars 292 forks source link

Out of memory? GTX 780M - 17.04 #19

Open quintendewilde opened 6 years ago

quintendewilde commented 6 years ago

Hey,

Trying to train you cool project with new images.

So I tried the example to train, but got these errors.

Error allocating 33554432 bytes of device memory (out of memory). Driver report 17367040 bytes free and 4231200768 bytes total 
Wait until the dataprocesses to end
Signal processes
Traceback (most recent call last):
  File "/home/quinten/Documents/3D-R2N2/lib/train_net.py", line 21, in func_wrapper
    return func(*args, **kwargs)
  File "/home/quinten/Documents/3D-R2N2/lib/train_net.py", line 38, in train_net
    net = NetClass()
  File "/home/quinten/Documents/3D-R2N2/models/net.py", line 37, in __init__
    self.setup()
  File "/home/quinten/Documents/3D-R2N2/models/net.py", line 40, in setup
    self.network_definition()
  File "/home/quinten/Documents/3D-R2N2/models/res_gru_net.py", line 70, in network_definition
    t_x_s_update = FCConv3DLayer(prev_s, fc7, (n_deconvfilter[0], n_deconvfilter[0], 3, 3, 3))
  File "/home/quinten/Documents/3D-R2N2/lib/layers.py", line 478, in __init__
    fan_out=self._output_shape[2])
  File "/home/quinten/Documents/3D-R2N2/lib/layers.py", line 74, in __init__
    self.val = theano.shared(value=self.np_values)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/compile/sharedvalue.py", line 268, in shared
    allow_downcast=allow_downcast, **kwargs)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/sandbox/cuda/var.py", line 188, in float32_shared_constructor
    deviceval = type_support_filter(value, type.broadcastable, False, None)
MemoryError: ('Error allocating 33554432 bytes of device memory (out of memory).', "you might consider using 'theano.shared(..., borrow=True)'")

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 118, in <module>
    main()
  File "main.py", line 109, in main
    train_net()
  File "/home/quinten/Documents/3D-R2N2/lib/train_net.py", line 24, in func_wrapper
    kill_processes(train_queue, train_processes)
  File "/home/quinten/Documents/3D-R2N2/lib/data_process.py", line 178, in kill_processes
    for p in processes:
TypeError: 'NoneType' object is not iterable
[INFO/MainProcess] process shutting down

I got a GTX 780M shouldn't this be enough to train the samples?

More of the same.

/home/quinten/Documents/3D-R2N2/lib/layers.py:354: UserWarning: DEPRECATION: the 'ds' parameter is not going to exist anymore as it is going to be replaced by the parameter 'ws'.
  padding=self._padding)
/home/quinten/Documents/3D-R2N2/lib/layers.py:354: UserWarning: DEPRECATION: the 'padding' parameter is not going to exist anymore as it is going to be replaced by the parameter 'pad'.
  padding=self._padding)
lib/data_io.py: model paths from ./experiments/dataset/shapenet_1000.json
[INFO/ReconstructionDataProcess-1] child process calling self.run()
lib/data_io.py: model paths from ./experiments/dataset/shapenet_1000.json
Set the learning rate to 0.000100.
[INFO/ReconstructionDataProcess-2] child process calling self.run()
Compiling training function
2017-11-19 10:48:51.435496 Iter: 0 Loss: 0.407328
Compiling testing function
Problem occurred during compilation with the command line below:
/usr/bin/g++ -shared -g -O3 -fno-math-errno -Wno-unused-label -Wno-unused-variable -Wno-write-strings -march=haswell -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mmovbe -maes -mno-sha -mpclmul -mpopcnt -mabm -mno-lwp -mfma -mno-fma4 -mno-xop -mbmi -mbmi2 -mno-tbm -mavx -mavx2 -msse4.2 -msse4.1 -mlzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 -mno-clflushopt -mno-xsavec -mno-xsaves -mno-avx512dq -mno-avx512bw -mno-avx512vl -mno-avx512ifma -mno-avx512vbmi -mno-clwb -mno-mwaitx -mno-clzero -mno-pku --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=8192 -mtune=haswell -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -m64 -fPIC -I/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/numpy/core/include -I/usr/include/python3.5m -I/home/quinten/Documents/3D-R2N2/py3/include/python3.5m -I/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof -L/usr/lib -fvisibility=hidden -o /home/quinten/.theano/compiledir_Linux-4.10--generic-x86_64-with-Ubuntu-17.04-zesty-x86_64-3.5.3-64/tmpvcuds4gb/m9124b60ae7623786b4d02a7f8ac06738.so /home/quinten/.theano/compiledir_Linux-4.10--generic-x86_64-with-Ubuntu-17.04-zesty-x86_64-3.5.3-64/tmpvcuds4gb/mod.cpp -lpython3.5m
ERROR (theano.gof.cmodule): [Errno 12] Cannot allocate memory
Wait until the dataprocesses to end
Signal processes
Empty queue
kill processes
Signal processes
Empty queue
kill processes
Traceback (most recent call last):
  File "main.py", line 118, in <module>
    main()
  File "main.py", line 109, in main
    train_net()
  File "/home/quinten/Documents/3D-R2N2/lib/train_net.py", line 21, in func_wrapper
    return func(*args, **kwargs)
  File "/home/quinten/Documents/3D-R2N2/lib/train_net.py", line 71, in train_net
    solver.train(train_queue, val_queue)
  File "/home/quinten/Documents/3D-R2N2/lib/solver.py", line 170, in train
    _, val_loss, _ = self.test_output(batch_img, batch_voxel)
  File "/home/quinten/Documents/3D-R2N2/lib/solver.py", line 218, in test_output
    *self.net.activations])
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/compile/function.py", line 326, in function
    output_keys=output_keys)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/compile/pfunc.py", line 486, in pfunc
    output_keys=output_keys)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/compile/function_module.py", line 1795, in orig_function
    defaults)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/compile/function_module.py", line 1661, in create
    input_storage=input_storage_lists, storage_map=storage_map)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/link.py", line 699, in make_thunk
    storage_map=storage_map)[:3]
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/vm.py", line 1047, in make_all
    impl=impl))
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/op.py", line 935, in make_thunk
    no_recycling)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/op.py", line 839, in make_c_thunk
    output_storage=node_output_storage)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/cc.py", line 1190, in make_thunk
    keep_lock=keep_lock)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/cc.py", line 1131, in __compile__
    keep_lock=keep_lock)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/cc.py", line 1586, in cthunk_factory
    key=key, lnk=self, keep_lock=keep_lock)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 1159, in module_from_key
    module = lnk.compile_cmodule(location)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/cc.py", line 1489, in compile_cmodule
    preargs=preargs)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/gof/cmodule.py", line 2294, in compile_str
    p_out = output_subprocess_Popen(cmd)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/misc/windows.py", line 77, in output_subprocess_Popen
    p = subprocess_Popen(command, **params)
  File "/home/quinten/Documents/3D-R2N2/py3/lib/python3.5/site-packages/theano/misc/windows.py", line 43, in subprocess_Popen
    proc = subprocess.Popen(command, startupinfo=startupinfo, **params)
  File "/usr/lib/python3.5/subprocess.py", line 676, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.5/subprocess.py", line 1221, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
[INFO/MainProcess] process shutting down
quintendewilde commented 6 years ago

I've changed the theanoflags for cuda and shutted down al order software (google chrome) to run it and now I'm getting this.

Compiling testing function
2017-11-19 12:03:16.399917 Test loss: 0.665769
param 0 : 0.114151
param 1 : 0.100100
param 2 : 0.222605
param 3 : 0.100100
param 4 : 0.199780
param 5 : 0.100100
param 6 : 0.182799
param 7 : 0.100100
param 8 : 0.579801
param 9 : 0.100100
param 10 : 0.156189
param 11 : 0.100100
param 12 : 0.134776
param 13 : 0.100100
param 14 : 0.475140
param 15 : 0.100100
param 16 : 0.142268
param 17 : 0.100100
param 18 : 0.136335
param 19 : 0.100100
param 20 : 0.132680
param 21 : 0.100100
param 22 : 0.136117
param 23 : 0.100100
param 24 : 0.371115
param 25 : 0.100100
param 26 : 0.139044
param 27 : 0.100100
param 28 : 0.146845
param 29 : 0.100100
param 30 : 0.166665
param 31 : 0.100100
param 32 : 0.110215
param 33 : 0.110708
param 34 : 0.100100
param 35 : 0.117502
param 36 : 0.111097
param 37 : 0.100000
param 38 : 0.111521
param 39 : 0.108941
param 40 : 0.100100
param 41 : 0.112705
param 42 : 0.100100
param 43 : 0.121206
param 44 : 0.100100
param 45 : 0.109536
param 46 : 0.100100
param 47 : 0.120517
param 48 : 0.100100
param 49 : 0.124972
param 50 : 0.100100
param 51 : 0.159994
param 52 : 0.100100
param 53 : 0.730088
param 54 : 0.100100
param 55 : 0.170752
param 56 : 0.100100
param 57 : 0.202829
param 58 : 0.100100
param 59 : 0.194471
param 60 : 0.100100
param 61 : 0.231402
param 62 : 0.100100
Wait until the dataprocesses to end
Signal processes
Empty queue
kill processes
Signal processes
Empty queue

Which I guess that it is working. Though it now keeps 'stuck' at empty queue, but maybe this just takes a long while.

ChengXu1995 commented 6 years ago

Hi, I also got this problem. Can you tell me how did you sole this issue?