GGGHSL / GraphDreamer

[CVPR'24] GraphDreamer: a novel framework of generating compositional 3D scenes from scene graphs.
https://graphdreamer.github.io/
MIT License
160 stars 1 forks source link

Edge list issues for the given prompt #4

Closed cs-mshah closed 6 months ago

cs-mshah commented 6 months ago

I am trying to generate a scene based on the following prompt: "a Person sitting on a Chair, holding a Magic Wand in his right hand, positioned in front of a Fireplace, cartoon, blender". I modified the wizard_study.sh script. Here is the script:

export P="a Person sitting on a Chair, holding a Magic Wand in his right hand, positioned in front of a Fireplace, cartoon, blender"
export P1="'a Person: tall, sitting, focused'"
export P2="'a Chair: wooden, sturdy, carved with runes, aged'"
export P3="'a Magic Wand: purple, glowing, mystic, star-tipped'"
export P4="'a Fireplace: chimney, hearth, mantel, aged, burning logs'"
export NP="ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions"

export P12="a Person sitting on a Wooden Chair, cartoon, blender"
export P13="a Person holding a Magic Wand in his right hand, cartoon, blender"
export P14="a Person positioned in front of a Fireplace, cartoon, blender"
export N124="a Person sitting on an Wooden Chair, positioned in front of a Fireplace"
export N134="a Person holding a Magic Wand in his right hand, positioned in front of a Fireplace"
export N123="a Person sitting on an Wooden Chair, holding a Magic Wand in his right hand"

export PG=[["$P12"],["$P13"],["$P14"]]
export E=[[0,1],[0,2],[0,3]]
export C=[[-0.25,0.1,0.],[0.24,0.12,0.],[0.25,0.13,0.2],[0.28,-0.16,0.2]]
# export C=[[-0.2487,0.0807,0.],[0.2445,0.1220,0.],[0.2555,0.1239,0.2],[0.2802,-0.1589,0.2]]
export R=[0.5,0.5,0.3,0.3]

# Name save folder:
export TG="person_fireplace"

# 1. Coarse stage:
python launch.py --config configs/gd-if.yaml --train --gpu 0 exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$P" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$P1"],["$P2"],["$P3"],["$P4"]] system.prompt_obj_neg=[["$N134"],["$N124"],["$N123"]] system.prompt_global="$PG" system.edge_list=$E system.guidance.guidance_scale=[200.,100.] system.guidance.guidance_scale_milestones=[2000,] system.geometry.center_params=$C system.geometry.radius_params=$R system.optimizer.params.geometry.lr=0.01 data.resolution_milestones=[2000,] trainer.max_steps=4600

# 2. Fine stage:
export RP="a 4K DSLR high-resolution high-quality photo of "$P""
export RP1="'a 4K DSLR high-resolution high-quality photo of a Person: tall, sitting, focused'"
export RP2="'a 4K DSLR high-resolution high-quality photo of a Chair: wooden, sturdy, carved with runes, aged'"
export RP3="'a 4K DSLR high-resolution high-quality photo of a Magic Wand: purple, glowing, mystic, star-tipped'"
export RP4="'a 4K DSLR high-resolution high-quality photo of a Fireplace: chimney, hearth, mantel, aged, burning logs'"
export RP12="a 4K DSLR high-resolution high-quality photo of "$P12""
export RP13="a 4K DSLR high-resolution high-quality photo of "$P13""
export RP14="a 4K DSLR high-resolution high-quality photo of "$P14""

export RPG=[["$RP12"],["$RP13"],["$RP14"]]

# Avoid OOM: data.batch_size=1 data.width=128 data.height=128
python launch.py --config configs/gd-sd-refine.yaml --train --gpu 0 exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-if/$TG/ckpts/last.ckpt data.batch_size=2 data.width=128 data.height=128 trainer.max_steps=10000 trainer.val_check_interval=200

# Increase training resolution: data.width=256 data.height=256 (Optional: 1xA100 required)
python launch.py --config configs/gd-sd-refine.yaml --train --gpu 0 exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=1. system.geometry.num_objects=4 system.prompt_processor.prompt="$RP" system.prompt_processor.negative_prompt="$NP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"],["$RP4"]] system.prompt_obj_neg=[["$N134"],["$N124"],["$N123"]] system.prompt_global="$RPG" system.edge_list=$E system.geometry.center_params=$C system.geometry.radius_params=$R resume=examples/gd-sd-refine/$TG/ckpts/epoch=0-step=10000.ckpt data.batch_size=1 data.width=256 data.height=256 trainer.max_steps=20000 trainer.val_check_interval=200

However there are some edge list issues. Have I missed something? I believe I've added the relevant edges and negative prompts where needed.

cs-mshah commented 6 months ago

Can you give more information on how to pass the edeges? like is it an adjacency list or are they simply a list of connections.

Can you share it for this example:

export P="a DSLR photo of a Tiger writing a Letter on a Table"
export P1="a DSLR photo of a Tiger"
export P2="a DSLR photo of a Letter Scroll"
export P3="a DSLR photo of a Table"
export N23="a Letter Scroll on a Table"
export N13="a Tiger writing on a Table"
export N12="a Tiger writing a Letter Scroll"

export CD=0. 

# Use different tags to avoid overwriting:
export TG="tiger_table"

python launch.py --config configs/gd-if.yaml --train --gpu 0 exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=0. system.geometry.num_objects=3 system.prompt_processor.prompt="$P" system.prompt_processor.negative_prompt="ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions" system.prompt_obj=[["$P1"],["$P2"],["$P3"]] system.prompt_obj_neg=[["$N23"],["$N13"],["$N12"]] system.geometry.sdf_center_dispersion=$CD system.guidance.guidance_scale=[50.,20.] system.guidance.guidance_scale_milestones=[2000,] system.optimizer.params.geometry.lr=0.01

export RP=$P", 4K high-resolution high-quality"
export RP1=$P1", 4K high-resolution high-quality"
export RP2=$P2", 4K high-resolution high-quality"
export RP3=$P3", 4K high-resolution high-quality"

python launch.py --config configs/gd-sd-refine.yaml --train --gpu 0 exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=0. system.geometry.num_objects=3 system.prompt_processor.prompt="$RP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"]] system.prompt_obj_neg=[["$N23"],["$N13"],["$N12"]] system.geometry.sdf_center_dispersion=$CD data.fovy_range=[70,90] data.eval_fovy_deg=90 resume=examples/gd-if/$TG/ckpts/last.ckpt  
# Adjust data.fovy_range to avoid OOM.

On running, we get an error asking for edges, but there's also a LOG suggesting that edges don't need to be provided in a scene with two/three objects. This scene has 3 objects. Maybe change that LOG statement as well?

GGGHSL commented 6 months ago

Thanks for the question. We will update this explanation in the usage instructions.

The system.edge_list is an ordered list corresponding to the edge-wise prompt list system.prompt_global. It simply describes which two objects, say $o_i$ and $o_j$ (i.e., edge [(i-1),(j-1)]), should be rendered out together (i.e., edge rendering) when optimizing object $o_i$ as a pairwise-relationship constraint to $o_j$. Therefore, the length of system.edge_list and system.prompt_global should equal the number of objects.

For example, in the provided scene "tiger_table" (and all other three-object scenes as well), system.edge_list is by default set as a cyclic list [[0,1],[1,2],[0,2]]. If there is no strong relationship between $o_i$ and $o_j$, just use "and" in the corresponding Pij.

The error you get in the 3 object scene is because system.edge_list is set by default but system.prompt_global is not provided. Just add system.prompt_global=[["$N12"],["$N23"],["$N13"]] (for the coarse stage) and it should be fine.

cs-mshah commented 6 months ago

Here is the tiger example which I gave and the outputs:

export P="a DSLR photo of a Tiger writing a Letter on a Table"
export P1="a DSLR photo of a Tiger"
export P2="a DSLR photo of a Letter Scroll"
export P3="a DSLR photo of a Table"
export N23="a Letter Scroll on a Table"
export N13="a Tiger writing on a Table"
export N12="a Tiger writing a Letter Scroll"

export CD=0. 

# Use different tags to avoid overwriting:
export TG="tiger_table"

# python launch.py --config configs/gd-if.yaml --train --gpu 0 exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=0. system.geometry.num_objects=3 system.prompt_processor.prompt="$P" system.prompt_processor.negative_prompt="ugly, bad anatomy, blurry, pixelated obscure, unnatural colors, poor lighting, dull, and unclear, cropped, lowres, low quality, artifacts, duplicate, morbid, mutilated, poorly drawn face, deformed, dehydrated, bad proportions" system.prompt_obj=[["$P1"],["$P2"],["$P3"]] system.prompt_obj_neg=[["$N23"],["$N13"],["$N12"]] system.prompt_global=[["$N12"],["$N23"],["$N13"]] system.geometry.sdf_center_dispersion=$CD system.guidance.guidance_scale=[50.,20.] system.guidance.guidance_scale_milestones=[2000,] system.optimizer.params.geometry.lr=0.01

export RP=$P", 4K high-resolution high-quality"
export RP1=$P1", 4K high-resolution high-quality"
export RP2=$P2", 4K high-resolution high-quality"
export RP3=$P3", 4K high-resolution high-quality"

export RP12=$N12", 4K high-resolution high-quality"
export RP23=$N23", 4K high-resolution high-quality"
export RP13=$N13", 4K high-resolution high-quality"

python launch.py --config configs/gd-sd-refine.yaml --train --gpu 0 exp_root_dir="examples" use_timestamp=false tag=$TG system.loss.lambda_entropy=0. system.geometry.num_objects=3 system.prompt_processor.prompt="$RP" system.prompt_obj=[["$RP1"],["$RP2"],["$RP3"]] system.prompt_obj_neg=[["$N23"],["$N13"],["$N12"]] system.prompt_global=[["$RP12"],["$RP23"],["$RP13"]] system.geometry.sdf_center_dispersion=$CD data.fovy_range=[70,90] data.eval_fovy_deg=90 resume=examples/gd-if/$TG/ckpts/last.ckpt  
# Adjust data.fovy_range to avoid OOM.

For the course stage:

https://github.com/GGGHSL/GraphDreamer/assets/56499208/bbe9264e-a239-4862-b57c-b23b38b92ece

For the fine stage:

https://github.com/GGGHSL/GraphDreamer/assets/56499208/aab06f93-0ca9-42af-b5f0-12e551a45088

It seems that the table didn't get generated. Also the letter is gone. Is there some issue in the script? Like the negative prompt or the edges? Can you give examples/document on how to use the negative prompts, how to pass edges. Also Since this is a tedious task to create the scripts, is there a GPT4 prompt which can create this? I believe that is one contribution? Is it just for decomposing into a graph or for making these scripts as well.

It would be great if you could provide examples for the following:

It would be great if some more information on the centres is also known. How to select them and give as input.

Thanks.

GGGHSL commented 6 months ago

Thanks for your feedback. For generating scenes with >= 3 objects, it is better not to set system.geometry.sdf_center_dispersion to 0. (in your script, export CD=0.). At the beginning, object are initialized as randomly centered SDF spheres, and the dispersion of the centers adjusted by (multiplying) a hyperparameter system.geometry.sdf_center_dispersion. Therefore, setting CD=0. means objects are completely overlapping, which is not a good starting point for optimization (as object number increasing). By default, system.geometry.sdf_center_dispersion=0.2. Here are results we generated (the course stage) for the example:

export P="a Tiger writing a Letter Scroll on a Table".  # Added a word 'Scroll'
export P1="a Tiger"
export P2="a Letter Scroll"
export P3="a Table"
export P12="a Tiger writing a Letter Scroll"
export P13="a Tiger writing on a Table"
export P23="a Letter Scroll on a Table"

Course stage:

https://github.com/GGGHSL/GraphDreamer/assets/39009560/4e5caa78-6b0c-487e-8e95-d1f35d69cb17

https://github.com/GGGHSL/GraphDreamer/assets/39009560/b94cb575-f702-4194-8c9e-8b7be655b625

https://github.com/GGGHSL/GraphDreamer/assets/39009560/5b1811ac-a500-440a-b1e0-0fe355f074cd

The second object "a Letter Scroll" failed to appear as well, potentially because it is too thin and thus hard to distinct from the "Table" based on the predicted SDFs. Set system.loss.lambda_entropy > 0. (default) may help in such case, as it disencourages empty objects. Plus, Not sure if you have noticed this commit that fixed a bug on view-dependent prompting for objects.