Open liwei0826 opened 1 year ago
Seem like stable diffusion has a bias to show face on every angles even overhead view (like below image) so I think better diffusion model or better prompt is the only solution
This has been considered in other work for direct generation: https://3d-diffusion.github.io -- it may be possible to apply some of the concepts to Dreamfusion. I think that we could fine tune SD on synthetic images of people, faces and common objects matching the labels "side view, back view, top view" etc that are currently used, and this fine tune couid help to inform Janus issues. We may also be able to do negative prompt weights for things like faces, front, etc on the other projected views.
One thing that jumps out at me is that the "side" views currently being used in the cube are the same prompt. Even though the distinction between e.g. "view from the left/right side" or "viewed from the east/west" might be arbitrary, it will probably do better at not duplicating features than both "sides" of the cube being generated by "side view"
Along the same lines of improving the consistency of SD input, it might be worth making each "training step" cubic batch of images generate from the same "seed" in SD to improve the generated consistency