Closed josephdanielchang closed 4 years ago
Are you able to run testing with the provided weights? For training, 11GB memory should be enough for up to 8x upsampling, but might not for 16x.
That's odd. When I run testing, it gives a very similar error.
Command:
CUDA_VISIBLE_DEVICES=4 python -m task_jointUpsampling.main --load-weights weights_flow/x8_pac_weights_epoch_5000.pth --download --factor 8 --model PacJointUpsample --dataset Sintel --data-root data/sintel
Output:
TEST LOADER START
TEST LOADER END
Model weights initialized from: weights_flow/x8_pac_weights_epoch_5000.pth
TEST START
BEFORE APPLY MODEL
BEFORE NET
AFTER NET
AFTER APPLY MODEL
BEFORE APPLY MODEL
BEFORE NET
Traceback (most recent call last):
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 362, in <module>
main()
File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 335, in main
log_test = test(model, test_loader, device, last_epoch, init_lr, args.loss, perf_measures, args) # TEST
File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 89, in test
output = apply_model(model, lres, guide, args.factor)
File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 23, in apply_model
out = net(lres, guide)
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/joseph/pacnet-master/task_jointUpsampling/models.py", line 245, in forward
x = self.up_convts[i](x, guide_cur)
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/joseph/pacnet-master/pac.py", line 786, in forward
self.output_padding, self.dilation, self.shared_filters, self.native_impl)
File "/home/joseph/pacnet-master/pac.py", line 498, in pacconv_transpose2d
shared_filters)
File "/home/joseph/pacnet-master/pac.py", line 252, in forward
output = torch.einsum('ijklmn,jokl->iomn', (in_mul_k, weight))
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/functional.py", line 211, in einsum
return torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: CUDA out of memory. Tried to allocate 2.69 GiB (GPU 0; 10.92 GiB total capacity; 5.74 GiB already allocated; 1.86 GiB free; 2.78 GiB cached)
This is indeed odd ... which pytorch version are you using? The code was originally developed for 0.4, but there is an experimental branch for 1.4 which you might try out.
Using python >> import torch >> print(torch.version), mine is 1.1.0 I am running the optical flow test on Sintel data with weights_flow x8_pac_weights_epoch_50000.pth
python -m task_jointUpsampling.main --load-weights weights_flow/x8_pac_weights_epoch_5000.pth --download --factor 8 --model PacJointUpsample --dataset Sintel --data-root data/sintel
How many GB of GPU would you estimate is necessary to run the test program?
11GB GPUs should be enough for both training (w/ the exception of some 16x models) and testing. versions >1.0 are not supported by the master branch (I expect some test cases to fail as well). The th14 branch is to be used with version 1.4, but has not been thoroughly tested.
So, 1.0 should work then correct? Should I downgrade and test again or do you have other suggestions?
You can downgrade to 1.0 or upgrade to 1.4 (and use the th14 branch).
I downgraded to 1.0.0 and it still has GPU out of memory error for testing flow. Is the data-root supposed to be: --data-root data/sintel? There are a lot of folders under the data-root, should I specify a particular folder?
@josephdanielchang I just tested on 11GB mem GPU and found that indeed the 8x and 16x flow tests won't work. Sorry that I didn't provide clear information before. With a 11GB mem GPU, you are able to run all depth experiments and only 4x flow experiments.
The data path is correct as is.
Thanks, it does work with only 4x for flow. Followup question, where do I find the results for the these "upsampled" flow after running flow test on the sintel flow data? I only find a folder exp/sintel with test.log and train.log, but no .flo files generated anywhere. Is there supposed to be no output?
Right, the code is for quantitative evaluation only and does not save results (for the semantic segmentation code though we do have a "--eval pred" option for this purpose).
Hi, I'm a bit confused how to deal with this error. Can you help?
/home/joseph/pacnet-master/task_jointUpsampling/main.py:122: UserWarning: genfromtxt: Empty input file: "exp/sintel/train.log" log = np.genfromtxt(log_path, delimiter=',', skip_header=1, usecols=(0,)) /home/joseph/pacnet-master/task_jointUpsampling/main.py:122: UserWarning: genfromtxt: Empty input file: "exp/sintel/test.log" log = np.genfromtxt(log_path, delimiter=',', skip_header=1, usecols=(0,)) Traceback (most recent call last): File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/joseph/anaconda3/envs/pac/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 348, in
main()
File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 322, in main
log_test = test(model, test_loader, device, last_epoch, init_lr, args.loss, perf_measures, args)
File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 86, in test
output = apply_model(model, lres, guide, args.factor)
File "/home/joseph/pacnet-master/task_jointUpsampling/main.py", line 22, in apply_model
out = net(lres, guide)
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, *kwargs)
File "/home/joseph/pacnet-master/task_jointUpsampling/models.py", line 245, in forward
x = self.up_convts[i](x, guide_cur)
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(input, **kwargs)
File "/home/joseph/pacnet-master/pac.py", line 795, in forward
self.output_padding, self.dilation, self.shared_filters, self.native_impl)
File "/home/joseph/pacnet-master/pac.py", line 507, in pacconv_transpose2d
shared_filters)
File "/home/joseph/pacnet-master/pac.py", line 261, in forward
output = torch.einsum('ijklmn,jokl->iomn', (in_mul_k, weight))
File "/home/joseph/anaconda3/envs/pac/lib/python3.6/site-packages/torch/functional.py", line 211, in einsum
return torch._C._VariableFunctions.einsum(equation, operands)
RuntimeError: CUDA out of memory. Tried to allocate 2.69 GiB (GPU 0; 10.92 GiB total capacity; 5.74 GiB already allocated; 1.86 GiB free; 2.78 GiB cached)