cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”
https://gaoruiyuan.com/magicdrive/
GNU Affero General Public License v3.0
664 stars 40 forks source link

Trying to generate without any conditions #19

Closed shubham8899 closed 8 months ago

shubham8899 commented 8 months ago

I am having a hard time installing and running some of the libraries required for the demo and I want to perform inference with no conditioning (generation from pretrained weights with no map or boxes whatsoever). So I wrote this script:

import torch.nn as nn
from magicdrive.pipeline.pipeline_bev_controlnet import (
    StableDiffusionBEVControlNetPipeline,
    BEVStableDiffusionPipelineOutput,
)
from magicdrive.networks.unet_addon_rawbox import BEVControlNetModel #controlnet class
from magicdrive.networks.unet_2d_condition_multiview import UNet2DConditionModelMultiview
from magicdrive.pipeline.pipeline_bev_controlnet import StableDiffusionBEVControlNetPipeline
from diffusers import UniPCMultistepScheduler
from typing import Tuple, Union, List
from PIL import Image

pipe_param = {}
controlnet = BEVControlNetModel.from_pretrained('magicdrive_weights/SDv1.5mv-rawbox_2023-09-07_18-39_224x400/controlnet', bbox_embedder_cls= "magicdrive.networks.bbox_embedder.ContinuousBBoxWithTextEmbedding")
controlnet.eval()  # from_pretrained will set to eval mode by default
pipe_param["controlnet"] = controlnet

unet = UNet2DConditionModelMultiview.from_pretrained('magicdrive_weights/SDv1.5mv-rawbox_2023-09-07_18-39_224x400/unet')
unet.eval()
pipe_param["unet"] = unet

pipe = StableDiffusionBEVControlNetPipeline.from_pretrained('magicdrive_weights/runwayml:stable-diffusion-v1-5', **pipe_param, safety_checker=None, feature_extractor=None)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

pipe.enable_xformers_memory_efficient_attention()

generator = None

pipeline_param = {
    "guidance_scale": 2,  # if > 1, enable classifier-free guidance
    "num_inference_steps": 20,
    "eta": 0.0,
    "controlnet_conditioning_scale": 1.0,
    "guess_mode": False,
    "use_zero_map_as_unconditional": True,
    "bbox_max_length": None,
}

image: BEVStableDiffusionPipelineOutput = pipe(
    prompt='a vehicle driving in the rain in bright daylight',
    image=torch.zeros(1,8,200,200),
    camera_param=torch.zeros(1,6,3,7),
    height=224,
    width=400,
    generator=generator,
    bev_controlnet_kwargs={"bboxes_3d_data":None},
    **pipeline_param,
)
image: List[List[Image.Image]] = image.images

Are these the right steps to reproduce inference? I am getting pretty much the same image for all views. image

flymin commented 8 months ago

As there is no boxes and map is all zeros, the result you got looks fine (only background is generated without any semantic area). If you can run this code, you should be able to run our gui demo. You can check it out.

shubham8899 commented 8 months ago

all 6 views look pretty much the same. shouldn't they be different from each other?

flymin commented 8 months ago

Since neither boxes nor the map differentiates views, they looks very similar to each other.

shubham8899 commented 8 months ago

sure, I can't seem to process the dataset and bring it to its final form compatible with the inference pipeline, due to dependency issues. would you be able to provide me a small subset of the dataset (one example works too) in the final structure (containing val_input["bev_map_with_aux"], val_input['kwargs'], val_input["camera_param"] etc.)? It would be of great help in understanding your code thanks 🙏

flymin commented 8 months ago

Like I said here and

If you can run this code, you should be able to run our gui demo. You can check it out.

Could you please check it out first? If you have difficulty with any dependencies for gui demo, please let me know.