Tangshitao / MVDiffusion

MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion, NeurIPS 2023 (spotlight)
447 stars 21 forks source link

Content in outputs gradually disappear when Training depth conditioined model #47

Open OrangeSodahub opened 3 months ago

OrangeSodahub commented 3 months ago

Hi, I'm trying to train the depth-conditoned model from scratch on custom data, and confused about the results:

val at step=70: image

val at step=140: image

val at step 210: image

val at step 280: image

val at step 350: image

val at step 420: image

As the example shows, the prediction outputs gradually becomes blurred until there is no content, I wonder if it is correct phenomenon? Or did you occurred this issue before? Thanks.

OrangeSodahub commented 3 months ago

@Tangshitao I'm a little bit confused that if this codebase could be trained correctly, while I'm sure that inference has no problem. I wonder if there is anyone else trained successfully.

Tangshitao commented 3 months ago

I feel you motion is too large, which could cause inconsistent generation. Can you try data with small motions?

OrangeSodahub commented 3 months ago

@Tangshitao Thanks. But with small motions this issue still exists:

image

Even though the motion too large causes inconsistent generation, the quality of generations shouldn't degrade. Now the generations are totally meaningless.

YunzeMan commented 3 months ago

@Tangshitao I observe exactly the same problem. Very curious about the reason.

Tangshitao commented 3 months ago

I feel there might be extrinsic and intrinsic issues. Can you try to train with Scannet data?

OrangeSodahub commented 3 months ago

I'm sure that cameras have no problems. I found that the method of sampling frames to form a batch data is crucial, how do you produce the key_frames_0.6.txt files?

Tangshitao commented 3 months ago

I compute the overlap between each frame within a video, can record frame pair with overlap larger than 0.6

OrangeSodahub commented 1 month ago

@Tangshitao Hi, I'm so confused. I tried your codebase, and I intergrated your training script (depth) into another diffuser pipeline then tried again, this issue always exist -- The content disappeared gradually during training.

I really want to know is there anyone trained depth version successfully? Because I can't find out the problem. Big thanks!

Tangshitao commented 1 month ago

Have tried to train the model with the codes released instead of intergrated my codes into another codebase?

OrangeSodahub commented 1 month ago

Yes, I first tried that, then I tried to integrated one. Both of them have similar results, the outputs get disappeared

OrangeSodahub commented 1 month ago

By the way, can I just train your depth version model but replace the SD-2-Depth with SD1.5? I'm not sure will it impact the performance of the correspondence-aware attention layer.