AnjieCheng / SpatialRGPT

[NeurIPS'24] This repository is the implementation of "SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models"
https://www.anjiecheng.me/SpatialRGPT
Apache License 2.0
52 stars 5 forks source link

3D Scene Graph Generation #1

Closed mhd0528 closed 4 weeks ago

mhd0528 commented 1 month ago

Hi,

Thanks for sharing this amazing work! May I ask which parts I should use to generate a 3D scene graph for a customized image? Thanks!

AnjieCheng commented 1 month ago

You can check the run.py in dataset_pipeline. To test on a specific image, use the --input argument to provide the image path.

mhd0528 commented 1 month ago

Thanks! I have it running now.

mhd0528 commented 1 month ago

Hi, I'm still a bit confused. I got the pipeline running, but in the output I only got the log folders, including the Wis3D folder (JSON files for bbox and point_clouds). I assume the scene graphs should be in the npz or pickle folder, but those two folders are empty. May I ask if this result is expected or if there is something I miss? Thanks in advance. image

AnjieCheng commented 1 month ago

I've commented out the function and code to explicitly save the detection list to an external disk because Open3D bounding boxes can't be directly saved in npz or pickle. I used to serialize them into JSON files, which made the code a bit hacky.

But thanks for bringing this up! I've restored the function to save the detection list into JSONs. If you'd like to restore the saved JSONs back into an Open3D object, please check the README for instructions. Feel free to let me know if it works or not!

mhd0528 commented 1 month ago

I have them saved into JSONs, thanks again! Also, from what I understand in the paper, the pipeline will also generate the relative and metric relations between the detected objects, right? But in the JSONs I only find a list of masks and the box area, may I ask are those relations generated by another component? I saw the template-based questions proposed by the model in the log file so I assume those relations should also be generated here, right?

AnjieCheng commented 1 month ago

The relations are dynamically determined in dataset_pipeline/osdsynth/processor/prompt.py using the Open3D bounding box object, so we don’t explicitly save all relations (like left/right, distance, height/width, etc.). You can refer to the evaluate_predicates_on_pairs function in prompt.py, where combinations are sampled and their relations are evaluated. If you want to capture all pairwise relations, you can uncomment the random sampling section in the code to evaluate and save all relations.

mhd0528 commented 1 month ago

Yes, I'm able to change that part and get all relations. Just to confirm, by evaluating the relations, you mean to let LLM answer those questions in qualitative_prompts right? Basically, use the answers to determine if a relation holds for a detection pair, no confidence score is used here.

AnjieCheng commented 1 month ago

The relations are directly evaluated based on object attributes. For example, to check if object A is to the left of object B, we do the following:

A_pos = A_cloud.get_center()
B_pos = B_cloud.get_center()

is_left = A_pos[0] > B_pos[0]

And yes, no confidence score is used in this process.