cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
https://gaoruiyuan.com/magicdrive/
GNU Affero General Public License v3.0
656 stars 40 forks source link

Generated Data for Augmentation #93

Open mrabiabrn opened 1 month ago

mrabiabrn commented 1 month ago

Hi,

I noticed the code for validation set generations but didn't find any code for training data augmentation. Should we follow the same procedure for generating training data? Could you provide more details on this or share a link to the generated data for BEV perception? That would be greatly appreciated.

flymin commented 1 month ago

Yes, you can refer to the procedure for validation set. They are similar.

mrabiabrn commented 1 month ago

Hi, I tried to generate new samples from training data using the provided validation set generation script. However, I realized that for training instances, generations are not diverse and quite similar to the original data (color of the vehicles, shape of the road, background, etc.) This is not the case for validation samples. I can see diverse generations for the same bounding boxes. I added examples from training and validation generation results below. What do you think could be the reason for this?

Validation original vs generated image

Training original vs generated image

flymin commented 1 month ago

In some cases, it may happen. However, using such data to augment the original training set leads to improvements in downstream tasks.

If it is severe in your case, you can try editing the scene condition to generate different data for augmentation.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.

mrabiabrn commented 1 month ago

Training generations are generally like this in my case. Augmenting training data with this doesn't improve CVT performance, it even hurts it. I can try editing the scene and text condition, but to reproduce your results, it would be great if you could share your training and validation set generations so I can identify any discrepancies.

flymin commented 1 month ago

We already released the model weights. You can sample our model and see it.

I cannot share the data. However, I have to admit that our cases on the training set are similar to yours. We did not modify the code for perception models; we only added more generated data as described in our paper. Maybe you can also try bevfusion and see.

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.

RYHSmmc commented 2 weeks ago

@flymin Hello author, the road segmentation performance of CVT in Table 1 is 61, which is confirmed to be 59.3 in Table 4. We also reproduced 59.x. How was 61 obtained? Or what is the difference between these two data?

flymin commented 2 weeks ago

which is confirmed to be 59.3 in Table 4

This is not true. Please also see Figure 7. I think the problem lies in $M={0}$.

github-actions[bot] commented 1 week ago

This issue is stale because it has been open for 7 days with no activity. If you do not have any follow-ups, the issue will be closed soon.

mrabiabrn commented 6 days ago

To confirm, the results in Table 1 were generated with use_zero_map_as_unconditional = True and a guidance scale of 2. Is this correct?

flymin commented 21 hours ago

Yes. And sorry for the late reply.

mrabiabrn commented 21 hours ago

No problem at all, and thanks for clarifying!