Open pgn-dev opened 1 year ago
I think you can set a larger batch size to achieve "create different views for one pair of (prompt, start seed)". That is, you have to get the results in a round of generation (with 8 batch size). Else, if you split the one round of generation into two round s( 4 as batch size for 2 rounds) if fails to be consistent, as the two separate generation processes do not share the 3D Attn.
For example, do not use for azimuth_start in [90, 60]:
, and just simply set --num_frames
in the t2i.py as 8, 12, etc. This may address your problem. Or, you can specify the azimuth you want.
Besides, I think the MVDream training does not introduce camera dependent styles. In my opinion, the camera pose only affects the consistency of the generated object, not the generation style.
I think you can set a larger batch size to achieve "create different views for one pair of (prompt, start seed)". That is, you have to get the results in a round of generation (with 8 batch size). Else, if you split the one round of generation into two round s( 4 as batch size for 2 rounds) if fails to be consistent, as the two separate generation processes do not share the 3D Attn.
For example, do not use
for azimuth_start in [90, 60]:
, and just simply set--num_frames
in the t2i.py as 8, 12, etc. This may address your problem. Or, you can specify the azimuth you want.
I believe the public model was only trained for generating 4 views at a time so not sure how consistent it would be for 8 or 12 views. Moreover, consistent generation over separate processes would be interesting for 3D reconstruction.
I think you can set a larger batch size to achieve "create different views for one pair of (prompt, start seed)". That is, you have to get the results in a round of generation (with 8 batch size). Else, if you split the one round of generation into two round s( 4 as batch size for 2 rounds) if fails to be consistent, as the two separate generation processes do not share the 3D Attn. For example, do not use
for azimuth_start in [90, 60]:
, and just simply set--num_frames
in the t2i.py as 8, 12, etc. This may address your problem. Or, you can specify the azimuth you want.I believe the public model was only trained for generating 4 views at a time so not sure how consistent it would be for 8 or 12 views. Moreover, consistent generation over separate processes would be interesting for 3D reconstruction.
I have tried to use MVDream to generate more than 4 views and it works. I think the model actually learns a strong priori from the camera pose. And for consistent generation over separate processes, currently I have no idea. But in 3D reconstruction or text-to-3D, it seems naturally provided by the Nerf (I guess). That is, due to the Nerf, even the generation results among multiple processes are not that consistent, the generated 3D content is still acceptable. The above statement is according to the experimental results I have done, but it may not be completely accurate. Looking forward to more discussions.
@pgn-dev , @yanjk3 Hi guys, I will be embarked on exciting internship for the next 5months and half on 3D generative AI and i was wondering if yous guys can share some of your experiences in that field with me. I'm just a beginner with 3D generation. so I'm looking forward reading from you Best regard
I was doing a small experiment on MVDream to evaluate consistency across generations with this piece of code:
TL;DR: the code freezes the noise for two generations with different sets of camera angles.
The output looks like this:
Although the styles for the two different sets of camera angles are similar, they are not the same. So I would not be able to create different views for one pair of
(prompt, start seed)
in a separate, independent generation unless I have the exact same set of camera positions.Is this expected? Does the MVDream training regime introduce camera dependent styles?