YijinHuang / SSiT

SSiT: Saliency-guided Self-supervised Image Transformer for Diabetic Retinopathy Grading
22 stars 4 forks source link

def issue: rotation_with_mask #5

Closed Wadha-Almattar closed 1 year ago

Wadha-Almattar commented 1 year ago

Command: python main.py --n-gpus 1 --num-workers 5 --arch ViT-S-p16 --batch-size 128 --epochs 10 --warmup-epochs 2 - -data-index /home/guser1/SSiT/data_index/pretraining_dataset.pkl --save-path /home/guser1/SSiT/checkpoints

The code generate this error, which is coming from this function [rotation_with_mask]:

Single GPU mode

=============================== Number of training samples: 999

Total Epochs:

10

current Epochs No:

0 0it [00:00, ?it/s] Traceback (most recent call last): File "main.py", line 140, in main() File "main.py", line 88, in main worker(0, n_gpus, args) File "main.py", line 121, in worker train( File "/home/guser1/SSiT/train.py", line 41, in train for step, train_data in progress: File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 633, in next data = self._next_data() File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1345, in _next_data return self._process_data(data) File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 1371, in _process_data data.reraise() File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/_utils.py", line 644, in reraise raise exception AttributeError: Caught AttributeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 308, in _worker_loop data = fetcher.fetch(index) File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/guser1/SSiT/data.py", line 67, in getitem img_stu, img_tea, mask_stu, mask_tea = self.transform(img, mask) File "/home/guser1/SSiT/data.py", line 152, in call img_stu, mask_stu = self.rotation_with_mask(self.rotation_stu, img_stu, mask_stu, self.p_rotation_stu) File "/home/guser1/SSiT/data.py", line 185, in rotation_with_mask img = F.rotate(img, angle, tf.resample, tf.expand, tf.center, tf.fill) File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'RandomRotation' object has no attribute 'resample'

Wadha-Almattar commented 1 year ago

When these lines is omitted from the function [rotation_with_mask] and change it as follow:

These two lines are removed: img = F.rotate(img, angle, tf.resample, tf.expand, tf.center, tf.fill) mask = F.rotate(mask, angle, tf.resample, tf.expand, tf.center, tf.fill)

Instead, I only pass two argument and the code starts training but again another error popped up at the end of the training: img = F.rotate(img, angle) mask = F.rotate(mask, angle)

[2023-06-09 12:37:05] epoch: 10/10, cl loss: 3.745282, ss loss: 0.186888, lr: 0.000001, moco_m: 0.999995: : 7it [00:04, 1.46it/s]

Saved checkpoint to /home/guser1/SSiT/checkpoints

Traceback (most recent call last): File "main.py", line 140, in main() File "main.py", line 88, in main worker(0, n_gpus, args) File "main.py", line 128, in worker torch.distributed.barrier() File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 3327, in barrier default_pg = _get_default_group() File "/home/guser1/anaconda3/envs/ssit/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 707, in _get_default_group raise RuntimeError( RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

Wadha-Almattar commented 1 year ago

I removed this line, and it looks fine: File "main.py", line 128, in worker torch.distributed.barrier()

BUT, still trying to figure out the problem here:

function [rotation_with_mask] issue with tf.resample img = F.rotate(img, angle, tf.resample, tf.expand, tf.center, tf.fill) mask = F.rotate(mask, angle, tf.resample, tf.expand, tf.center, tf.fill)

YijinHuang commented 1 year ago

The parameter 'resample' for RandomRotation is removed after version 0.13 of torchvision. You can downgrade the version of torchvision to 0.12 or remove this parameter to solve this issue.