Closed momo1986 closed 1 year ago
Hi, @momo1986, thanks for your interest in our work. Could you share the complete log from your evaluation? That should help me better understand the issue.
We evaluate our models on 8 GPUs, and you use 1 GPU. Different numbers of GPUs should not be the issue, but still could you try evaluating with 8 GPUs if possible?
Hi @momo1986, I tried evaluating our Swin-L OneFormer on a single GPU (--num_gpus=1
) and it gives the expected result. You can find my evaluation log here.
classes IoU nIoU
--------------------------------
road : 0.985 nan
sidewalk : 0.869 nan
building : 0.940 nan
wall : 0.668 nan
fence : 0.695 nan
pole : 0.723 nan
traffic light : 0.767 nan
traffic sign : 0.854 nan
vegetation : 0.933 nan
terrain : 0.659 nan
sky : 0.959 nan
person : 0.870 0.738
rider : 0.728 0.621
car : 0.965 0.885
truck : 0.903 0.640
bus : 0.931 0.772
train : 0.847 0.692
motorcycle : 0.697 0.616
bicycle : 0.773 0.689
--------------------------------
Score Average : 0.830 0.707
--------------------------------
categories IoU nIoU
--------------------------------
flat : 0.988 nan
construction : 0.943 nan
object : 0.781 nan
nature : 0.936 nan
sky : 0.959 nan
human : 0.876 0.764
vehicle : 0.950 0.876
--------------------------------
Score Average : 0.919 0.820
--------------------------------
[06/13 12:57:42 d2.evaluation.testing]: copypaste: Task: sem_seg
[06/13 12:57:42 d2.evaluation.testing]: copypaste: IoU,iIoU,IoU_sup,iIoU_sup
[06/13 12:57:42 d2.evaluation.testing]: copypaste: 82.9802,70.6712,91.9019,81.9933
Hi @praeclarumjj3.
Thanks for your kind reply.
I am currently working on this issue. It always reports the error log "error in ms_deformable_im2col_cuda".
I doubt that this error causes the performance gap.
Here is the evaluation log.
https://drive.google.com/file/d/1Kgf_NYITtZTkpx_6EilNjWZFO_s2hEpO/view?usp=sharing
I work on NVIDIA_3090 machine. Its defualt CUDA toolkit is 11.1. However, I installed the pytorch version and cuda toolkit with OneFormer official installation guidance.
Thanks & Regards! Momo
Hi, @momo1986, thanks for the log. You have installed PyTorch with CUDA 11.3 build. However, the CUDA version on your local machine is 11.1. I suggest you install PyTorch with CUDA <= 11.1 build.
I noticed you already opened an issue about this in #67. I am closing this issue. Let's have a further conversation about this under that issue. Feel free to re-open this if you face any other issues.
Script:
python train_net.py --num-gpus 1 --config-file configs/cityscapes/swin/oneformer_swin_large_bs16_90k.yaml --eval-only MODEL.IS_TRAIN False MODEL.WEIGHTS 250_16_swin_l_oneformer_cityscapes_90k.pth MODEL.TEST.TASK semantic
CUDA: 11.1 Pytorch: 1.10.1Ideal result is:
However, my result is very wierd:
Your sharing is great. It is my honor to apply OneFormer. However, this reproduction gap is an issue that I need to address.
I am sorry to bother your guys.
Thanks & Regards! Momo