Open JanBlume opened 1 month ago
Hi
Thank you for reaching out. Unfortunately the codebase works just on Pascal for now. I am planing to update the codebased very soon. Anyway, about your questions, please find the answers below:
1- Yes, I think.
2- about the crop size, I had hardware limitation, so I tried 1024512 instead of 10241024. I hard coded this item in dataloaders/init.py and set the crop size to [512, 1024]. Actually, this is why I haven't upload the codes yet, as they are not clear and I should make the codes better then commit the new changes.
3- I believe that there is no need for this parameter when using cityscapes. Please have a look at dataloaders/datasets/cityscapes.py
4- It should be something like this: python train_kd.py --backbone resnet18 --dataset cityscapes --nesterov --epochs 50 --batch-size 6 --attn_lambda 15
5- I don't think so. Again please have a look at dataloaders/datasets/cityscapes.py
I will update the repo soon. It will be ready to test on Cityscape and the pretrained teacher for cityscapes will be available. Your final result heavily depends on your teacher weights, and sometimes minor difference maybe observed based on the GPU you are using. I myself used a single RTX 3090 for the runs.
Best, Amir
Hello Amir, thank you for your fast response. In the meantime, I was able to significantly improve my accuracy and my mIoU, and I noticed a few things that I'd like to share with you.
base Size I found that in dataloaders/datasets/cityscapes.py the base_size is indeed used:
def transform_tr(self, sample):
composed_transforms = transforms.Compose([
tr.RandomHorizontalFlip(),
tr.RandomScaleCrop(base_size=self.args.base_size, crop_size=self.args.crop_size, fill=255),
tr.RandomGaussianBlur(),
tr.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
tr.ToTensor()])
If no base_size is specified during training it defaults to 513. The problem is the Cityscapes images have a resoultion of1024x2048 pixels and would require a base_size of 1024 and not 513. The consequence is seen in the code for RandomScaleCrop:
short_size = random.randint(int(self.base_size * 0.5), int(self.base_size * 2.0))
If the base_size is just half the actual size, it means that you scale just between 0.25 and 1 and not between 0.5 and 2, like mentioned in the paper. The change of the scaling to (0.25,1) improved my accuracy by more than 3%.
RandomGaussianBlur It was indeed used (see code snippet: def transform_tr)
Unwanted constant in CrossEntropyLoss Your calculation of the cross entropy loss looks like this:
def CrossEntropyLoss(self, logit, target):
n, c, h, w = logit.size()
criterion = nn.CrossEntropyLoss(weight=self.weight, ignore_index=self.ignore_index,
size_average=self.size_average)
if self.cuda:
criterion = criterion.cuda()
loss = criterion(logit, target.long())
if self.batch_average:
loss /= n
return loss
If I'm not mistaken, the nn.CrossEntropyLoss averages already over the batch and with the "loss/n" it is done a second time. This means that in the gradient calculation, a factor of 1/4 is used (in the case of batch_size = 4). So it should have the same effect as a learning rate that is reduced to 1/4 of its original size. The second problem is that the loss is dependent on the batch size.
Thanks again for your fast response and your work. It was very helpful to me! If my comments are incorrect, please feel free to correct me.
Best, Jan
Hello Amir,
thanks for your great work.
I am currently trying to recreate the experiments for Cityscapes in my own framework. I used your work and your paper as a template for the transformation steps because they were well described.
Unfortunately, my results still differ significantly from yours, so I'm still looking for a bug in my code.
Maybe you have some time and can answer a few questions so that I can find my bug faster. That would be wonderful!
Best regards Jan