czczup / ViT-Adapter

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions
https://arxiv.org/abs/2205.08534
Apache License 2.0
1.27k stars 140 forks source link

Reproducing on ViT-Adapter-T #81

Closed CA-TT-AC closed 1 year ago

CA-TT-AC commented 1 year ago

Hello,

I am having difficulty reproducing the mIoU scores for the ViT-Adapter-T model that was pre-trained using the AugReg-T method. I would like to inquire whether there could be any issues with my reproduction method.

Attached to this issue is a log of my reproduction attempt. Could you please help me investigate and identify any potential issues that could be causing the discrepancy between my results and the original mIoU scores? 20230323_141446.log

Thank you for your assistance.

czczup commented 1 year ago

I note that you use a single GPU to train this model. It should be trained with a total batch size of 16. If you use only one GPU, you should set samples_per_gpu=16 in the config, or use more GPU.

PS: The consumed GPU memory shown in the log is not accurate, you can check nvidia-smi for the accurate value.

# By default, models are trained on 8 GPUs with 2 images per GPU
data=dict(samples_per_gpu=2,
          val=dict(pipeline=test_pipeline),
          test=dict(pipeline=test_pipeline))
CA-TT-AC commented 1 year ago

I am sorry that I did not consider this problem. Thank you for your prompt reply!