About 2D&3D joint training

MV-Dream / MVDream

code page placeholder

577 stars 9 forks source link

About 2D&3D joint training #13

Open lizhiqi49 opened 9 months ago

lizhiqi49 commented 9 months ago

Very nice work!

I have a question about 2D&3D joint training: I think it's very intuitive that only training with the synthetic 3D dataset will lead to degeneration on the quality of generated images and easily overfitting to the synthetic 3D data, so it should help to introduce high-quality 2D data into training. But since you didn't show the comparison of with/without 2D data in training, I want to know how much it has improved the generation quality in your practice. Thanks.

seasonSH commented 9 months ago

Here are some examples. Empirically we found the joint training leads better quality and text-image consistency.

The examples are "a bulldog wearing a black pirate hat" and "an astronaut riding a horse".	No 2D data	2D+3D Training

lizhiqi49 commented 9 months ago

Thank you! The performance was really improved a lot. And I have another question:

You mentioned in your paper that you sample data batch from laion image dataset with 30% chance. When training with multi-view batch, the batch size is 4096 (1024x4), what's the number for 2D batch (1024 or 4096)?

seasonSH commented 9 months ago

We train the model with 32 A100 GPUs distributed on 4 nodes. Each node has a batch size of 256. So for each node:

70% chance a batch is a multi-view batch, which has 256x4 images with 256 text descriptions.
30% chance a batch is an image batch, which has 1024 image+text pairs.

The mode could be different for each node at the same step.

lizhiqi49 commented 9 months ago

OK, thanks a lot.