cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
https://gaoruiyuan.com/magicdrive/
GNU Affero General Public License v3.0
664 stars 40 forks source link

about performance at resolution 272x736 #92

Closed RYHSmmc closed 4 days ago

RYHSmmc commented 1 month ago

Hello, I retrain and get the result as your paper of the AP、NDS and FID at 224x400 resolution, but failed using 272x736 resolution. Can your give me more detail about 272x736 exp?

flymin commented 1 month ago

Besides changing the image resolution, we changed the conv module for bev map. I will get back to you today. If not, please remind me.

flymin commented 1 month ago

Hi,

In our 272x736 setting, we do not change the map size, which is 200x200. However, the map encoder should be changed to output 34x96 latent, which should be the same as the images. Basically, we change all padding in convs to (1, 1), and change the first stride=2 conv to stride=1, and add an AdaptiveAvgPool2d. FYI

(
  (conv_in): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (blocks): ModuleList(
    (0): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (3): Conv2d(32, 96, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (4): Conv2d(96, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (5): Conv2d(96, 256, kernel_size=(3, 3), stride=(2, 1), padding=(1, 1))
    (6): AdaptiveAvgPool2d(output_size=[34, 92])
  )
  (conv_out): Conv2d(256, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
)
RYHSmmc commented 1 month ago

@flymin Thank you very much, I will retrain the MagicDrive as your setting and report the results here.

RYHSmmc commented 1 month ago

@flymin Urgent distress, I retrained the 172x736 model (350ep for 5 days), but the mAP of bevfusion is only 13.5. Is there anything else that needs to be modified in the inference process to adapt to the resolution?

RYHSmmc commented 1 month ago

272x736

flymin commented 1 month ago

Did you use CFG for inference? The default should be 2, and we use 2.5 to report better results in the table.

Another thing I could think of, could you verify the mAP of bevfusion on real data?

According to my old logs, 150 epochs should give you ~18 mAP. This took us 62.5h of training on 8*V100.

RYHSmmc commented 1 month ago

@flymin When we verify the bevfusion on real data, we get mAP 35.30 and NDS 42.25. When setting CFG to 2.5 we get mAP 13.59,NDS 24.50. Beisdes, in dataset of config, we set back_resize: [899, 1600] # (h, w) back_pad: [0, 1, 0, 0]
augment2d: resize: [[0.3023,0.46]] rotate: null and other setting same as the default. Can you check this for me? thanks again !

flymin commented 1 month ago

Yes, you also need to change them, but the parameters are not correct.

First, you should have

dataset:
  image_size: [272, 736]
  augment2d:
    resize: [[0.5, 0.5]]
    rotate: null

They come from bevfusion and should be set before training MagicDrive. Then for testing:

dataset:
  back_resize: [544, 1472]
  back_pad: [64, 356, 64, 0]

They are aligned with the training transforms.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.