Closed mhd0528 closed 4 weeks ago
You can check the run.py
in dataset_pipeline.
To test on a specific image, use the --input
argument to provide the image path.
Thanks! I have it running now.
Hi, I'm still a bit confused. I got the pipeline running, but in the output I only got the log folders, including the Wis3D folder (JSON files for bbox and point_clouds). I assume the scene graphs should be in the npz or pickle folder, but those two folders are empty. May I ask if this result is expected or if there is something I miss? Thanks in advance.
I've commented out the function and code to explicitly save the detection list to an external disk because Open3D bounding boxes can't be directly saved in npz or pickle. I used to serialize them into JSON files, which made the code a bit hacky.
But thanks for bringing this up! I've restored the function to save the detection list into JSONs. If you'd like to restore the saved JSONs back into an Open3D object, please check the README for instructions. Feel free to let me know if it works or not!
I have them saved into JSONs, thanks again! Also, from what I understand in the paper, the pipeline will also generate the relative and metric relations between the detected objects, right? But in the JSONs I only find a list of masks and the box area, may I ask are those relations generated by another component? I saw the template-based questions proposed by the model in the log file so I assume those relations should also be generated here, right?
The relations are dynamically determined in dataset_pipeline/osdsynth/processor/prompt.py
using the Open3D bounding box object, so we don’t explicitly save all relations (like left/right, distance, height/width, etc.). You can refer to the evaluate_predicates_on_pairs
function in prompt.py
, where combinations are sampled and their relations are evaluated. If you want to capture all pairwise relations, you can uncomment the random sampling section in the code to evaluate and save all relations.
Yes, I'm able to change that part and get all relations. Just to confirm, by evaluating the relations, you mean to let LLM answer those questions in qualitative_prompts
right? Basically, use the answers to determine if a relation holds for a detection pair, no confidence score is used here.
The relations are directly evaluated based on object attributes. For example, to check if object A is to the left of object B, we do the following:
A_pos = A_cloud.get_center()
B_pos = B_cloud.get_center()
is_left = A_pos[0] > B_pos[0]
And yes, no confidence score is used in this process.
Hi,
Thanks for sharing this amazing work! May I ask which parts I should use to generate a 3D scene graph for a customized image? Thanks!