allenai / Holodeck

CVPR 2024: Language Guided Generation of 3D Embodied AI Environments.
https://yueyang1996.github.io/holodeck
Apache License 2.0
304 stars 25 forks source link

Inquiry about the structure and field meanings of `scene` dictionary #30

Closed Bool1020 closed 3 months ago

Bool1020 commented 3 months ago

Description

I encountered a question while using the Holodeck project and I'd like to understand the structure and field meanings of the scene dictionary. In my code, I create a Controller object using the following snippet:

controller = Controller(
    agentMode="default",
    makeAgentsVisible=False,
    visibilityDistance=1.5,
    scene=scene,
    width=width,
    height=height,
    fieldOfView=90,
    action_hook_runner=ProceduralAssetHookRunner(
        asset_directory=objaverse_asset_dir,
        asset_symlink=True,
        verbose=True,
    ),
    platform=CloudRendering
)

Request

Thank you for your assistance!

YueYANG1996 commented 3 months ago

Here is an example scene JSON.

To understand the meaning of each field, you can refer to the init function of each module class, for example, the door module: https://github.com/allenai/Holodeck/blob/156f8e1077ba5811bbe613f9d65b8f66c48f2346/modules/doors.py#L16

The main fields you will care about are "rooms," "objects," "doors," "walls," and "windows," and I think they are easy to understand by looking at the examples.

If you have specific questions about the scene JSON, you could comment below.

Bool1020 commented 3 months ago

I'd like to inquire about the selection process for the 50k objaverses in the holodeck. Could you please provide insight into the criteria used for choosing these objaverses from the total pool of 800k?

YueYANG1996 commented 3 months ago

We have the object type attribute for all the 800k assets, and we manually select a subset of object types that are useful for indoor scenes.

Bool1020 commented 3 months ago

How to convert walls and floors to a glb file?

YueYANG1996 commented 3 months ago

The walls and floors are from AI2-THOR, different from the objects, not sure how to convert them to .glb. @sunfanyunn have tried to load the holodeck scenes into blender (including doors and windows) by export .fbx from Unity. @sunfanyunn Do you have any ideas?

sunfanyunn commented 3 months ago

For the walls and floors, you can construct planes programmatically in Blender and assign the corresponding texture files to them.

For example,

def create_wall_mesh(name, vertices):
    # Create a new mesh
    mesh = bpy.data.meshes.new(name)
    obj = bpy.data.objects.new(name, mesh)

    # Link the object to the scene
    scene = bpy.context.scene
    scene.collection.objects.link(obj)

    # Make the new object the active object
    bpy.context.view_layer.objects.active = obj
    obj.select_set(True)

    # Enter Edit mode to create the wall geometry
    bpy.ops.object.mode_set(mode='EDIT')

    # Create a BMesh
    bm = bmesh.new()

    # Create the vertices
    for v in vertices:
        bm.verts.new(v)

    # Ensure the lookup table is updated
    bm.verts.ensure_lookup_table()

    # Create the edges between consecutive vertices
    for i in range(len(vertices)-1):
        bm.edges.new([bm.verts[i], bm.verts[i+1]])

    # Create the face (assuming a closed loop)
    bm.faces.new(bm.verts)

    bpy.ops.object.mode_set(mode='OBJECT')

To assign materials to an object in Blender, refer to this code snippet.

StephenYangjz commented 3 months ago

After a scene is generated, is there a way to parse the results and get the rotation/tanslation and get each ferniture as a GLB file? Thanks!

YueYANG1996 commented 3 months ago

You can easily obtain the rotation and position of each piece of furniture from the "objects" in the JSON file. To convert assets into other formats, you can refer to the script here.

StephenYangjz commented 3 months ago

Thanks @YueYANG1996 ! Is there a way to extract the canvas/floor size as well? I remember we cant extract them from AI2Thor so if I wanna recreate them do you have a recommended way?

YueYANG1996 commented 3 months ago

Yes, that's possible. In the JSON file, there is a field called "rooms", and you can get the information you need from "vertices".

Bool1020 commented 2 months ago

Why does the similarity of CLIP need to be multiplied by 100? Aren't the similarities of CLIP and BERT of the same order of magnitude?

YueYANG1996 commented 2 months ago

Why does the similarity of CLIP need to be multiplied by 100? Aren't the similarities of CLIP and BERT of the same order of magnitude?

100 is a hyperparameter because we want the retrieval system to rely mainly on image similarity; textual similarity is just to protect the system from retrieving objects from the wrong category (coarse-grained).

Bool1020 commented 2 months ago

In this case, can't the similarity of SBERT be neglected?