NVlabs / SegFormer

Official PyTorch implementation of SegFormer
https://arxiv.org/abs/2105.15203
Other
2.58k stars 357 forks source link

Bucket size warning #73

Open rahulagrawal048 opened 2 years ago

rahulagrawal048 commented 2 years ago
[W reducer.cpp:347] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.
grad.sizes() = [19, 256, 1, 1], strides() = [256, 1, 256, 256]
bucket_view.sizes() = [19, 256, 1, 1], strides() = [256, 1, 1, 1] (function operator())

I trained the segmenter-B0 model on Cityscapes dataset using 4 GPUs with a sample size per GPU as 2 and get the above warning. Has anyone faced a similar problem or know where the issue might be?

This caused a drop in performance to mIoU ~ 63 on the val set as compared to 76.2 stated in the paper.

TheoPis commented 2 years ago

I too have had a similar warning when using B0, B1, B5 on both Cityscapes and ADE20K. In my case with B0 on Cityscapes I get single scale miou 75.3 instead of 76.2 stated in the paper. It also seems that per-batch training time is slowed down by this warning/error to a significant degree. I would be greatfull if any suggestions could be provided as to what may be causing this.

rahulagrawal048 commented 2 years ago

With B0 on Cityscapes, what batch size per GPU did you use to get 75.3? Did you change anything else?

TheoPis commented 2 years ago

@rahulagrawal048 : I used a total batch size of 8 (4 gpus, 2 per gpu). It's also important to mention that I do not use mmseg rather I have made a very carefull introduction of the MiT and Segformer implementations from here to my codebase. I also closely followed the config files in local_configs/ for B0. I thought the error message you mentioned was somehow related to me not using mmseg but it may be a more general issue than that. Do you use mmseg in your reproduction?

rahulagrawal048 commented 2 years ago

Yes I am completely using mmseg and that might be the reason for the low mIoU I get.

harshm121 commented 2 years ago

Warning: Grad strides do not match bucket view strides.

Were you able to figure out what led to this warning?