fab-jul / RC-PyTorch

PyTorch code for the CVPR'20 paper "Learning Better Lossless Compression Using Lossy Compression"
GNU General Public License v3.0
59 stars 10 forks source link

CUDA: out of memory running test #3

Open CrhistyanSilva opened 3 years ago

CrhistyanSilva commented 3 years ago

Hi! I'm trying to run the test for CLIC.mobile dataset, but it show "RuntimeError: CUDA out of memory." Is there any option or change that help with it? Here is the command I'm running:

# Note: depending on your environment, adapt CUDA_VISIBLE_DEVICES.
CUDA_VISIBLE_DEVICES=0 python -u run_test.py \options 
    "$MODELS_DIR" 1109_1715 \
    "AUTOEXPAND:$DATASET_DIR/mobile_valid" \
    --restore_itr 1000000 \
    --tau \
    --clf_p "$MODELS_DIR/1115_1729*/ckpts/*.pt" \
    --qstrategy CLF_ONLY

I was able to run with open images 500 without problems, maybe my GPU is not enough for this other test? The GPU is a GeForce RTX 2070 with Max-Q Design with 8192Mb of memory. OS: Linux Mint 2021

Here is the full output:

WRAN: using Agg backend linux
*** AC_NEEDS_CROP_DIM = 3000000
*** AUTOEXPAND ->
9: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q9
10: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q10
11: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q11
12: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q12
13: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q13
14: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q14
15: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q15
16: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q16
17: /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q17
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00040
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q9/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00030
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q9_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q10/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q10_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q11/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q11_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00030
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q12/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q12_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00030
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q13/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q13_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q14/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00027
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q14_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q15/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q15_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q16/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q16_None_dS=False_professional_valid_None_dS=False)...
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00029
Getting lock for /home/crhistyan/proyecto_grado/bpg/datasets/professional_valid_bpg_q17/cached_glob.pkl: .cached_glob.lock [reset: False]...
>>> filter [min_size=None; discard_s=False]: 0.00028
Sorting...
Subsampling to use 2 imgs of ResidualDataset(41 images, id=Cprofessional_valid_bpg_q17_None_dS=False_professional_valid_None_dS=False)...
Starting Q 13
Got 1 datasets.
Testing 1109_1715 at -1 ---
Got keep test.
After filter:
GlobalConfig()
*** global_config fw_s=3
*** global_config long_means
*** global_config long_pis
*** global_config long_sigma
*** global_config gdn
*** global_config no_norm_final
*** global_config lr.initial=5e-05
*** global_config down_up=deconv
Updating config.lr.initial = 5e-05
Using global_config: GlobalConfig(
    down_up=deconv
    fw_s=3
    gdn
    long_means
    long_pis
    long_sigma
    lr.initial=5e-05
    no_norm_final
    unet_skip)
*** no norm for final
*** DownUp, adding DeconvUp()
filter_width for sigma = 3
Did set tail_networks.sigmas
Did set tail_networks.means
Did set tail_networks.pis
Setting tail_networks[ dict_keys(['sigmas', 'means', 'pis']) ]
EB: self.cin_style = None
******************************
*** Padding by a factor 2
******************************
*** Setting classifier...
Using classifier with config configs/ms/clf/down2_nonorm_down.cf
ClassifierNetwork(
  (head): Sequential(
    (0): Conv2d(3, 64, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
    (1): IdentityModule()
    (2): ReLU(inplace)
    (3): Conv2d(64, 128, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
    (4): IdentityModule()
    (5): ReLU(inplace)
  )
  (model): Sequential(
    (0): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (1): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (2): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (3): ResBlock(Conv(128x3)/N(128)/ReLU(inplace)/Conv(128x3)/N(128))
    (4): Conv2d(128, 256, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
    (5): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (6): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (7): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (8): ResBlock(Conv(256x3)/N(256)/ReLU(inplace)/Conv(256x3)/N(256))
    (9): ChannelAverage()
  )
  (tail): Sequential(
    (0): Linear(in_features=256, out_features=7, bias=True)
  )
)
Restoring /home/crhistyan/proyecto_grado/bpg/models/1115_1729 clf@down2_nonorm_down clf@model1715 exp_min=6.25e-06 lr.initial=0.0001 lr.schedule=exp_0.25_i50000 n_resblock=4/ckpts/ckpt_0000106000.pt
Loaded!
*** Enabling QSTRATEGY=CLF_ONLY...
*** Ignoring 0 ckpts after 1612829659.0823963
Restoring /home/crhistyan/proyecto_grado/bpg/models/1109_1715 gdn_wide_deep3 new_oi_q12_14_128 unet_skip/ckpts/ckpt_0000998500.pt... (strict=True)
Testing <dataloaders.compressed_images_loader.MetaResidualDataset object at 0x7f9e40b37690>
*** MetaResidualDataset professional_valid_None_dS=False_m2_multi_q9_10_11_12_13_14_15_16_17
Traceback (most recent call last):
  File "run_test.py", line 303, in <module>
    main()
  File "run_test.py", line 84, in main
    results += tester.test_all(datasets)
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 309, in test_all
    return self._get_results(datasets, self.test)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/fjcommon/functools_ext.py", line 32, in composed
    return f1(f2(*args_c, **kwargs_c))
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 291, in _get_results
    results = [fn(ds) for ds in datasets]
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 291, in <listcomp>
    results = [fn(ds) for ds in datasets]
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 333, in test
    results = self._test(ds)
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/test/multiscale_tester.py", line 420, in _test
    out = self.blueprint.forward(x_n_crop, bpps)  # Note: bpps only used for conditional IN!
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/blueprints/enhancement_blueprint.py", line 236, in forward
    network_out: prob_clf.NetworkOutput = self.net(x_l, side_information)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/crhistyan/proyecto_grado/bpg/RC-PyTorch/src/modules_enh/enhancement_network.py", line 288, in forward
    x = self.unet_skip_conv(torch.cat((x, x_after_head), dim=1))
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/crhistyan/anaconda3/envs/bpg/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 338, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 1.33 GiB (GPU 0; 7.80 GiB total capacity; 5.56 GiB already allocated; 1.14 GiB free; 17.20 MiB cached)

Thanks!!

habbisify commented 3 years ago

I'm currently having the same problem - I was able to get bpsp on Open Images Validation 500 but got the CUDA out of memory in professional_valid in "Get bpsp on all evaluation datasets from the paper". I'll continue looking for solutions, but would appreciate help here!

CaiShilv commented 2 years ago

If used "--write_to_files $output_dir" argument, it will be running error about "'EnhancementLosses' object has no attribute 'tau_optim'". Has anyone ever been in this situation?