GGGHSL / GraphDreamer

[CVPR'24] GraphDreamer: a novel framework of generating compositional 3D scenes from scene graphs.
https://graphdreamer.github.io/
MIT License
160 stars 1 forks source link

reproduction questions #9

Open haodong2000 opened 4 months ago

haodong2000 commented 4 months ago

Hi authors, pioneering work!

I tried to reproduce the wizard_study case, just runing bash scripts/wizard_study.sh, with modified TG only. Please see the result below:

https://github.com/GGGHSL/GraphDreamer/assets/67775090/6ddbcc4d-24ba-4aad-866b-4c02c86ff743

Also, I tried to ask the GPT again for generating per-object prompt, the per-relation prompt, and the negative prompt that containing all others except the current object, here is the bash:

wizard_modified.sh

export P="A Wizard standing in front of a Wooden Desk, gazing into a Crystal Ball perched atop the Wooden Desk, with a Stack of Ancient Spell Books perched atop the Wooden Desk." export NP="ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions" export P1="'Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'" export P2="'Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'" export P3="'Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'" export P4="'Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'" export P12="The wizard is standing in front of the wooden desk." export P23="The stack of ancient spell books is perched atop the wooden desk." export P13="The wizard is gazing into the crystal ball." export P34="The crystal ball is perched atop the wooden desk." export N234="A wooden desk is visible with a crystal ball and a stack of ancient spell books on it." export N134="A standing wizard is gazing into a crystal ball, and there's also a stack of ancient spell books." export N124="There's a wizard standing before a wooden desk, on which a stack of ancient spell books is also placed." export N123="A standing wizard is gazing into a crystal ball, both of which are by a wooden desk." export PG=[["$P12"],["$P23"],["$P13"],["$P34"]] export E_START_AT_1=[[1,2],[2,3],[1,3],[3,4]] export E=[[0,1],[1,2],[0,2],[2,3]] # manually tuned parameters export C=[[-0.2,0.2,0.0],[0.15,-0.15,-0.3],[0.4,0.2,0.25],[0.15,-0.15,0.16]] export RO=[[0,0,0],[0,0,0],[0,0,0],[0,0,0]] export R=[1.0,0.9,0.3,0.3] # Name save folder: export TG="wizard_modified" export CUDA=1 # 1. Coarse stage: python launch.py --config configs/gd-if.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$P" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$P1"],["$P2"],["$P3"],["$P4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$PG" system.edge_list=$E system.guidance.guidance_scale=[200.,100.] system.guidance.guidance_scale_milestones=[2000,] system.geometry.center_params=$C system.geometry.radius_params=$R system.optimizer.params.geometry.lr=0.01 data.resolution_milestones=[2000,] trainer.max_steps=4600 # 2. Fine stage: export RP="a 4K DSLR high-resolution high-quality photo of "$P"" export RP1="'a 4K DSLR high-resolution high-quality photo of a Wizard: A wizard with a cloak and a wizard hat is standing upright, with his eyes fixed at a certain distance.'" export RP2="'a 4K DSLR high-resolution high-quality photo of a Wooden Desk: A sturdy wooden desk with a rich, dark brown color. It has organizational compartments and a flat top.'" export RP3="'a 4K DSLR high-resolution high-quality photo of a Crystal Ball: A crystal ball rests on the desk. It is clear and shiny, and seems to be radiating a mystical energy.'" export RP4="'a 4K DSLR high-resolution high-quality photo of a Stack of Ancient Spell Books: A tall stack of several ancient spell books stacked neatly atop the wooden desk. The books look old, used, and full of mystery.'" export RP12="a 4K DSLR high-resolution high-quality photo of "$P12"" export RP23="a 4K DSLR high-resolution high-quality photo of "$P23"" export RP13="a 4K DSLR high-resolution high-quality photo of "$P13"" export RP34="a 4K DSLR high-resolution high-quality photo of "$P34"" export RPG=[["$RP12"],["$RP23"],["$RP13"],["$RP34"]] # Avoid OOM: data.batch_size=1 data.width=128 data.height=128 python launch.py --config configs/gd-sd-refine.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-if/$TG/ckpts/last.ckpt data.batch_size=1 data.width=128 data.height=128 trainer.max_steps=10000 trainer.val_check_interval=200 # Increase training resolution: data.width=256 data.height=256 (Optional: 1xA100 required) python launch.py --config configs/gd-sd-refine.yaml --train --gpu $CUDA exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N234"],["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-sd-refine/$TG/ckpts/epoch=0-step=10000.ckpt data.batch_size=1 data.width=128 data.height=128 trainer.max_steps=20000 trainer.val_check_interval=200

And here is the result.

https://github.com/GGGHSL/GraphDreamer/assets/67775090/e2ece5f7-e1cc-4fa1-98bb-faed26f6d2de

I've checked the prompts, and the 3D layout (please see the XY comparison below, the Z-axis is basically aligned), it seems good.

3655660d5eec0cfd3a8ba50cf63e087

I am wondering the possible reasons for this. Any help will be appearciated, thanks in advance!

btw, I am re-running the wizard_modified that has opposite Y-axis value with desk object, hope it will be better :)