allenai / Holodeck

CVPR 2024: Language Guided Generation of 3D Embodied AI Environments.
https://yueyang1996.github.io/holodeck
Apache License 2.0
304 stars 25 forks source link

data/objaverse_holodeck format - triangles, scales, orientation etc #9

Closed MichaelRabinovich closed 6 months ago

MichaelRabinovich commented 7 months ago

I'm trying to use the code, and render the created/given scenes without dependency on AIThor, for instance set up the scene in pyrender. The most immediate reason is that I'm running on Windows and I'm having the same issues running AIThor as someone else written here in another post.

I had no luck in trying to render/setup a scene by taking one of the output json files and using the uid to download the original objaverse models. It seems like the scale, rotation and translation of the models is off, and it does seem that you scaled and transformed the models in objaverse_holodeck/09_23_combine_scale.

Therefore I wanted to load the meshes in objaverse_holodeck/09_23_combine_scale, and the pkl file contains both 'vertices' and 'triangles'. I was hoping triangles are the mesh's faces array pointing to indices of vertices in every triangle, but it seems to be a one dimensional array of the form [0,1,...., len(vertices)-1]. So I have a few questions:

1) How can I access the faces array of the meshes in 09_23_combine_scale? I haven't seen such an array in the pkl file. It also seems to me that I can't take the original faces, or at least I'm not sure how to as the number of verticesI get by reading the original glb files from objaverse (with trimesh or open3d) is different than the one I get from reading the pkl file. I also couldn't find the faces information in other files at the data folder, so I was wondering how or from where does AIThor read the faces per model?

2) Before loading 3D models to my scene and using the layout information (in pyrender or any other renderer that is not AIThor), should I scale or rotate the 3D models? I've seen the metadata file called "objaverse_holodeck_database.json", and for each Mesh there's a "scale" annotation that is not 1, should I then scale it before loading it? Are there any other transformations I should do (translation centering and/or rotation?). I saw an annotation for "pose_z_rot_angle"- is that a rotation around the z axis in radians that I should perform after the scaling? Any other transformations I should run on loaded models before placing them in positions and orientations given by the layout in the scene's json file?

Thanks a lot and great work!

YueYANG1996 commented 7 months ago

We will address running the code on Windows soon after the holiday. For your question about using your pipeline, it might be very challenging and time-consuming to make it work. Here are some reasons: 1) The Objaverse assets can be very large and make it hard to load many such assets at the same time. Therefore, we did some mesh reduction to optimize the asset. That's why the faces, vertices, etc., differ from the original asset. 2) The scale and rotation of the original Objaverse assets are very noisy. We've tried human annotations on the assets, but they still have a lot of errors. We finally used GPT-4 to annotate the asset and get the processed assets in 09_23_combine_scale. 3) The annotations you saw in the JSON file like pose_z_rot_angle are outdated and not useful to you. 4) The asset format in 09_23_combine_scale is for AI2-Thor. I think you can recover the 3D information from the pickle file, but it can be very tricky.

So, I suggest waiting until I hear from the Thor team on how to run on Windows.

MichaelRabinovich commented 7 months ago

Thank you for the swift response. The windows issue is not my only reason to use the assets as is. I have no interest in using the original objaverse and would just like to use your processed objaverse assets. How do I access the triangles/faces of your processed meshes?

YueYANG1996 commented 6 months ago

Okay, got it. I prompt GPT-4 to get a script to convert the pkl.gz into .obj file:

# Full code to convert the pkl.gz data to an .obj file

import gzip
import pickle
import numpy as np
import os

def load_pkl_gz(file_path):
    """Load a .pkl.gz file."""
    with gzip.open(file_path, 'rb') as f:
        return pickle.load(f)

def extract_vertices(vertices_data):
    """Extract vertices into a NumPy array from the given data format."""
    return np.array([[v['x'], v['y'], v['z']] for v in vertices_data])

def create_faces(triangles_data):
    """Create faces (triangles) from the given indices."""
    return triangles_data.reshape(-1, 3)

def save_to_obj(vertices, faces, file_path):
    """Save vertices and faces to an OBJ file."""
    with open(file_path, 'w') as file:
        for v in vertices:
            file.write(f"v {v[0]} {v[1]} {v[2]}\n")
        for f in faces:
            # OBJ files are 1-indexed
            file.write(f"f {f[0]+1} {f[1]+1} {f[2]+1}\n")

def convert_pkl_gz_to_obj(input_file_path, output_file_path):
    """Convert a .pkl.gz file to an .obj file."""
    # Load the .pkl.gz file
    data = load_pkl_gz(input_file_path)

    # Extracting vertices and triangles (faces) from the data
    vertices = np.array(data['vertices'])
    triangles = np.array(data['triangles'])

    # Process the data
    vertices_array = extract_vertices(vertices)
    faces_array = create_faces(triangles)

    # Saving to .obj file
    save_to_obj(vertices_array, faces_array, output_file_path)

    return output_file_path

# File paths
input_file_path = '/mnt/data/0a0be10ec4974c8f932818d0a7472702.pkl.gz'
output_file_path = '/mnt/data/visualized_mesh.obj'

# Convert and save the .obj file
converted_obj_path = convert_pkl_gz_to_obj(input_file_path, output_file_path)

Then you can visualize the asset in blender, for example:

Screenshot 2023-12-31 at 10 50 01 AM

Then, within the blender, you can map the textures (albedo, emission, normal) to the mesh.

For running on Windows, since AI2-THOR only supports macOS 10.9+ or Ubuntu 14.04+, you can try installing Ubuntu on Windows as an alternative.