AngelosNal / Vision-DiffMask

Official PyTorch implementation of Vision DiffMask, a post-hoc interpretation method for vision models.
MIT License
27 stars 3 forks source link

Run Error #2

Closed Raxio-Z closed 6 months ago

Raxio-Z commented 7 months ago

​ Hi, thanks for your wonderful work but I have some questions about the code. I set up the environment condition as you described. But when I run the bash you provided below: python code/main.py --enable_progress_bar --num_epochs 20 --base_model ViT --dataset CIFAR10 \ --from_pretrained tanlq/vit-base-patch16-224-in21k-finetuned-cifar10 I met the error:

File "/data2/liuzheng/thesis/Vision-DiffMask/code/main.py", line 60, in setup_sample_image_logs train_iter_loader = iter(dm.train_loader()) AttributeError: 'CIFAR10DataModule' object has no attribute 'train_loader'

Therefore I change 'dm.train_loader()' to 'dm.train_dataloader()' in code/main.py. But I run into another error that I cannot fix. The full error messages are:

(dl2) liuzheng@101:~/thesis/Vision-DiffMask$ python code/main.py --enable_progress_bar --num_epochs 20 --base_model ViT --dataset CIFAR10 --from_pretrained ta nlq/vit-base-patch16-224-in21k-finetuned-cifar10
Global seed set to 123
Files already downloaded and verified
Files already downloaded and verified
wandb: Currently logged in as: raxio. Use wandb login --relogin to force relogin
wandb: wandb version 0.16.3 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.12.16
wandb: Run data is saved locally in /data2/liuzheng/thesis/Vision-DiffMask/wandb/run-20240305_114206-2t4lvlaa
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run add_activation=8.0-alpha=20.0-dataset=CIFAR10-diffmask_checkpoint=None-eps=0.1-grid_size=3-lr=2e-05-lr_alpha=0.3-lr_placeholder=0.001-mul_activation=15.0-no_pl aceholder=False-weighted_layer_distribution=False
wandb: ⭐️ View project at https://wandb.ai/raxio/Patch-DiffMask
wandb: 🚀 View run at https://wandb.ai/raxio/Patch-DiffMask/runs/2t4lvlaa
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
File "/data2/liuzheng/thesis/Vision-DiffMask/code/main.py", line 227, in
main(args)
File "/data2/liuzheng/thesis/Vision-DiffMask/code/main.py", line 157, in main
trainer.fit(diffmask, dm)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
self._call_and_handle_interrupt(
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 719, in _call_and_handle_interrupt
return self.strategy.launcher.launch(trainer_fn, *args, trainer=self, **kwargs)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/site-packages/pytorch_lightning/strategies/launchers/spawn.py", line 78, in launch
mp.spawn(
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/site-packages/torch/multiprocessing/spawn.py", line 189, in start_processes
process.start()
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "/data2/liuzheng/anaconda3/envs/dl2/lib/python3.9/site-packages/torch/multiprocessing/reductions.py", line 143, in reduce_tensor
raise RuntimeError("Cowardly refusing to serialize non-leaf tensor which requires_grad, "
RuntimeError: Cowardly refusing to serialize non-leaf tensor which requires_grad, since autograd does not support crossing process boundaries. If you just want to transfer the data, call detach() on the tensor before serializing (e.g., putting it on the queue).

I tried to some methods mentioned online, such as setting num_workers to 0 and make the grad of pretrained model associated with the dataloader to False. None of them works. I want to know if you have such a problem? It really troubles me. I hope you can help me. Looking forward to your reply and thank you very much!

Raxio-Z commented 6 months ago

I have solved my problem. Thanks to the great help from the author!