MV-Dream / MVDream

code page placeholder
577 stars 9 forks source link

About 2D&3D joint training #13

Open lizhiqi49 opened 9 months ago

lizhiqi49 commented 9 months ago

Very nice work!

I have a question about 2D&3D joint training: I think it's very intuitive that only training with the synthetic 3D dataset will lead to degeneration on the quality of generated images and easily overfitting to the synthetic 3D data, so it should help to introduce high-quality 2D data into training. But since you didn't show the comparison of with/without 2D data in training, I want to know how much it has improved the generation quality in your practice. Thanks.

seasonSH commented 9 months ago

Here are some examples. Empirically we found the joint training leads better quality and text-image consistency.

The examples are "a bulldog wearing a black pirate hat" and "an astronaut riding a horse". No 2D data 2D+3D Training
an_astronaut_riding_a_horse,_3d_asset_DDIM_50 (1) an_astronaut_riding_DDIM_50
a_bulldog_wearing_a_black_pirate_hat,_DDIM_50 (1) a_bulldog_wearing_DDIM_50 (5)
lizhiqi49 commented 9 months ago

Thank you! The performance was really improved a lot. And I have another question:

You mentioned in your paper that you sample data batch from laion image dataset with 30% chance. When training with multi-view batch, the batch size is 4096 (1024x4), what's the number for 2D batch (1024 or 4096)?

seasonSH commented 9 months ago

We train the model with 32 A100 GPUs distributed on 4 nodes. Each node has a batch size of 256. So for each node:

The mode could be different for each node at the same step.

lizhiqi49 commented 9 months ago

OK, thanks a lot.