DianCh / AdaContrast

Official repository of CVPR '22 paper "Contrastive Test-Time Adaptation"
91 stars 7 forks source link

CUDA out of memory and Waiting for W&B process to finish... (failed 1) #5

Open zwbx opened 1 year ago

zwbx commented 1 year ago

I set up a environment following the instruction, but it seems to encounter a problem.

(Adatta) root@053d4d94eab6:/code/AdaContrast# bash train_VISDA-C_target.sh /code/AdaContrast/ main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /opt/conda/envs/Adatta/lib/python3.8/site-packages/hydra/_internal/defaults_list.py:251: UserWarning: In 'root': Defaults list is missing _self_. See https://hydra.cc/docs/1.2/upgrades/1.0_to_1.1/default_composition_order for more information warnings.warn(msg, UserWarning) /opt/conda/envs/Adatta/lib/python3.8/site-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default. See https://hydra.cc/docs/1.2/upgrades/1.1_to_1.2/changes_to_job_working_dir/ for more information. ret = run_job( /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") [W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator()) [W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator()) /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") [W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator()) /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") [W Context.cpp:69] Warning: torch.set_deterministic is in beta, and its design and functionality may change in the future. (function operator()) [INFO] 2023-10-04 06:33:10 main_adacontrast.py:97 Dataset: VISDA-C, Source domains: ['train'], Target domains: ['validation'], Pipeline: target wandb: Currently logged in as: wenbo-zhang01 (wenbozhang). Use wandb login --relogin to force relogin wandb: Tracking run with wandb version 0.15.11 wandb: Run data is saved locally in /code/AdaContrast/output/VISDA-C/target/wandb/run-20231004_063312-6t2re0hg wandb: Run wandb offline to turn off syncing. wandb: Syncing run seed_2020 wandb: ⭐️ View project at https://wandb.ai/wenbozhang/VISDA-C wandb: 🚀 View run at https://wandb.ai/wenbozhang/VISDA-C/runs/6t2re0hg [INFO] 2023-10-04 06:33:17 target.py:241 Start target training on train-validation... [INFO] 2023-10-04 06:33:19 classifier.py:54 Loaded from /code/AdaContrast/best_train_2020.pth.tar; missing params: [] [INFO] 2023-10-04 06:33:21 classifier.py:54 Loaded from /code/AdaContrast/best_train_2020.pth.tar; missing params: [] [INFO] 2023-10-04 06:33:21 target.py:271 1 - Created target model [INFO] 2023-10-04 06:33:21 target.py:44 Eval and labeling... 0%| | 0/55 [00:00<?, ?it/s]/code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 55/55 [00:44<00:00, 1.23it/s] [INFO] 2023-10-04 06:34:08 target.py:79 Accuracy of direct prediction: 45.41 [INFO] 2023-10-04 06:34:08 utils.py:291 Accuracy per class: [45.72 8.95 37.4 63.31 48.8 2.75 82.64 18.6 48.71 25.43 88.69 7.26], mean: 39.85 [INFO] 2023-10-04 06:34:16 utils.py:291 Accuracy per class: [50.05 6.36 40.26 65.78 51.93 0.39 85.49 16.68 46.36 21.44 92.07 5.25], mean: 40.17 [INFO] 2023-10-04 06:34:18 target.py:114 Collected 55388 pseudo labels. [INFO] 2023-10-04 06:34:18 target.py:289 2 - Computed initial pseudo labels [INFO] 2023-10-04 06:34:18 target.py:311 3 - Created train/val loader [INFO] 2023-10-04 06:34:18 target.py:315 4 - Created optimizer [INFO] 2023-10-04 06:34:18 target.py:317 Start training... /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") /code/AdaContrast/main_adacontrast.py:24: UserWarning: The version_base parameter is not specified. Please specify a compatability version level, or None. Will assume defaults for version 1.1 @hydra.main(config_path="configs", config_name="root") wandb: Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing. wandb: / 0.004 MB of 0.004 MB uploaded (0.000 MB deduped) wandb: Run history: wandb: Test Acc ▁ wandb: Test Avg ▁ wandb: Test Post Acc ▁ wandb: Test Post Avg ▁ wandb: wandb: Run summary: wandb: Test Acc 45.41236 wandb: Test Avg 39.855 wandb: Test Post Acc 46.27536 wandb: Test Post Avg 40.17167 wandb: wandb: 🚀 View run seed_2020 at: https://wandb.ai/wenbozhang/VISDA-C/runs/6t2re0hg wandb: ️⚡ View job at https://wandb.ai/wenbozhang/VISDA-C/jobs/QXJ0aWZhY3RDb2xsZWN0aW9uOjEwMzkzMTk1OQ==/version_details/v2 wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s) wandb: Find logs at: ./wandb/run-20231004_063312-6t2re0hg/logs Error executing job with overrides: ['seed=2020', 'port=10001', 'memo=target', 'project=VISDA-C', 'data.data_root=/code/AdaContrast/datasets', 'data.workers=8', 'data.dataset=VISDA-C', 'data.source_domains=[train]', 'data.target_domains=[validation]', 'model_src.arch=resnet101', 'model_tta.src_log_dir=/code/AdaContrast/', 'optim.lr=2e-4'] Traceback (most recent call last): File "main_adacontrast.py", line 42, in main mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes while not context.join(): File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join raise Exception(msg) Exception:

-- Process 0 terminated with the following error: Traceback (most recent call last): File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap fn(i, args) File "/code/AdaContrast/main_adacontrast.py", line 141, in main_worker train_target_adacontrast(args) File "/code/AdaContrast/target.py", line 323, in train_target_domain train_epoch(train_loader, model, banks, optimizer, epoch, args) File "/code/AdaContrast/target.py", line 371, in trainepoch , logits_q, logits_ins, keys = model(images_q, images_k) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 619, in forward output = self.module(*inputs[0], *kwargs[0]) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/code/AdaContrast/moco/builder.py", line 173, in forward k, _ = self.momentum_model(im_k, return_feats=True) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/code/AdaContrast/classifier.py", line 37, in forward feat = self.encoder(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torchvision/models/resnet.py", line 220, in forward return self._forward_impl(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torchvision/models/resnet.py", line 208, in _forward_impl x = self.layer1(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torchvision/models/resnet.py", line 116, in forward identity = self.downsample(x) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 532, in forward return sync_batch_norm.apply( File "/opt/conda/envs/Adatta/lib/python3.8/site-packages/torch/nn/modules/_functions.py", line 52, in forward out = torch.batch_norm_elemt(input, weight, bias, mean, invstd, eps) RuntimeError: CUDA out of memory. Tried to allocate 98.00 MiB (GPU 0; 10.91 GiB total capacity; 8.39 GiB already allocated; 21.19 MiB free; 8.76 GiB reserved in total by PyTorch)

zwbx commented 1 year ago

it runs on 4*1080Ti. However, it outputs the same error when moved to 4090. What would be the reason probably? Thanks~

DianCh commented 1 year ago

Hi, thank you for your interest in our work!

We trained our model on 8 16GB GPUs; it looks like your per-GPU batch size is too large to fit.

Can you try gradient accumulation, or multi-node training?