cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
https://gaoruiyuan.com/magicdrive/
GNU Affero General Public License v3.0
419 stars 22 forks source link

the generated images is abnormal when I used nuScenes dataset #9

Closed zhangxiao696 closed 3 months ago

zhangxiao696 commented 4 months ago

follow your readme, I can run demo successfully. But when I train and test by nuScenes mini dataset, but the generated images is abnormal. Can you help us see where the problem lies?

  1. use mini nuScenes dataset python tools/create_data.py nuscenes --root-path ./data/nuscenes \ --out-dir ./data/nuscenes_mmdet3d_2 --extra-tag nuscenes --version v1.0-mini

  2. train model in debug config with 1xV100 accelerate launch --mixed_precision fp16 --gpu_ids all --num_processes 1 tools/train.py +exp=224x400 runner=debug runner.validation_before_run=true --version

  3. test python tools/test.py resume_from_checkpoint=magicdrive-log/debug/SDv1.5mv-rawbox_2024-02-27_09-57_224x400

0_gen0

截屏2024-02-27 20 04 43
flymin commented 4 months ago

I don’t think the mini split is sufficient to train the model, but if you test the pretrained model, it should be fine.

Besides, debug config is only used for testing training flow, which only updates the model with very few steps. The model is far from convergence.

zhangxiao696 commented 4 months ago

okey. another problem, use my own dataset, but no road map, can this project run?

flymin commented 4 months ago

Yes. Our pre-trained model also supports generation when setting the map to all zeros, but the generated road structure is not correct. For example, you may see cars in bushes.

Training another model without the road map may be better.

zhangxiao696 commented 3 months ago

Yes. Our pre-trained model also supports generation when setting the map to all zeros, but the generated road structure is not correct. For example, you may see cars in bushes.

Training another model without the road map may be better.

I would like to ask again, use my own dataset, no map, and no annotation results, how can I use bevfusion detect objects in your project, is this method feasible?

flymin commented 3 months ago

If you are asking about using pseudo-label to train MagicDrive, it is possible but we didn’t try it. If not, please specify your question again.

zhangxiao696 commented 3 months ago

If you are asking about using pseudo-label to train MagicDrive, it is possible but we didn’t try it. If not, please specify your question again.

In fact, without road map and 3D bounding boxes in my dataset, I want test your pre-trained model, I don't know if it's possible. If possible, how can I generate 3D bounding boxes first?

flymin commented 3 months ago

Our method is used to generate camera views by providing boxes, maps, etc. If you want to test our pre-trained model, basically, all you need would be boxes and maps. For example in our paper, we use the boxes and maps from the dataset. We do not investigate how to "generate" boxes or maps.

shubham8899 commented 3 months ago

Yes. Our pre-trained model also supports generation when setting the map to all zeros, but the generated road structure is not correct. For example, you may see cars in bushes.

Training another model without the road map may be better.

Could you please share an example to perform inference generation while setting all maps to zero? An input example for the StableDiffusionBEVControlNetPipeline class would be really helpful, thanks!

flymin commented 3 months ago

You only need to change one line of code. I will not add the change to the current repo. If it is needed for any application, please consider opening a PR.

zhangxiao696 commented 3 months ago

You only need to change one line of code. I will not add the change to the current repo. If it is needed for any application, please consider opening a PR.

sorry, I didn't update in a timely manner, I am currently researching. thanks!

flymin commented 3 months ago

Sure, no problem. I closed due to inactivity. If there is the same issue, please feel free to reopen.

FudongGe commented 2 weeks ago

Hi,

Thanks for your excellent work! Here, you mentioned that "We do not investigate how to "generate" boxes or maps". If so, how can you perform 3D Object Detection or BEV Segmentation without boxes or maps using generated new perspective images?

Perhaps I misunderstand or miss something. Could you give me some advice?

Thanks for your attention!

Our method is used to generate camera views by providing boxes, maps, etc. If you want to test our pre-trained model, basically, all you need would be boxes and maps. For example in our paper, we use the boxes and maps from the dataset. We do not investigate how to "generate" boxes or maps.

flymin commented 2 weeks ago

we use the boxes and maps from the dataset

FYI