NVlabs / FoundationPose

[CVPR 2024 Highlight] FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
https://nvlabs.github.io/FoundationPose/
Other
955 stars 99 forks source link

Output Discrepancy in run_ycb.py Testing #80

Open Sar-thak-3 opened 3 weeks ago

Sar-thak-3 commented 3 weeks ago

Description: When running run_ycb.py on the BOP testing data, a small proportion of the generated outputs display a discrepancy in sign compared to the ground truth rotational or translational matrices.

Problem Statement: Approximately 0.03% of the outputs from run_ycb.py exhibit a sign discrepancy in rotational or translational matrices when compared to the ground truth data.

Request for Clarification:

wenbowen123 commented 3 weeks ago

this does not look correct, especially if the translation negates, that can be way off. When did those cases happen?

Sar-thak-3 commented 3 weeks ago

Hi, thank you @wenbowen123 for your support, I am attaching the yaml output file generated after run_ycb.py execution. Here you can see in the very first predicted output have the discrepancy in the ground truth and predicted result. https://drive.google.com/file/d/1AOPjhIhh4AcyOldk8AnumJc8_PxlVBtv/view?usp=sharing

I used this following python script to generate the map for showing where the discrepancies are present in the whole predictions.

import yaml
import json
import numpy as np

def check_opposite_signs(array1, array2):
    # Check if any corresponding index elements have opposite signs
    return np.all(np.sign(array1) != np.sign(array2))

def extract_4x4_arrays_from_yaml(yaml_file):
    with open(yaml_file, 'r') as file:
        yml_data = yaml.safe_load(file)

    opposite_sign_map = {}

    total_count = 0
    opp_sign_count = 0

    for key, value in yml_data.items():
        video_dir = f"0000{key}"
        with open(f"scene_gt_{video_dir}.json", 'r') as file:
            json_data = json.load(file)

        opposite_sign_map[video_dir] = {}

        for key1,value1 in value.items():
            img_id = key1.lstrip('0')

            opposite_sign_map[video_dir][img_id] = {}

            i = 0
            for key2,value2 in value1.items():

                total_count += 1

                wrong = False

                obj_id = int(key2)

                cam_r = np.array(json_data[img_id][i]["cam_R_m2c"]).reshape((3,3))
                cam_t = np.array(json_data[img_id][i]["cam_t_m2c"])
                object_id_from_json = json_data[img_id][i]["obj_id"]

                four_cross_array = np.array(value2)

                opposite_sign_map[video_dir][img_id][obj_id] = []

                if(obj_id==object_id_from_json):
                    if(check_opposite_signs(cam_r[:,0],four_cross_array[:3,0])):
                        opposite_sign_map[video_dir][img_id][obj_id].append("0R")
                        wrong = True
                    if(check_opposite_signs(cam_r[:,1],four_cross_array[:3,1])):
                        opposite_sign_map[video_dir][img_id][obj_id].append("1R")
                        wrong = True
                    if(check_opposite_signs(cam_r[:,2],four_cross_array[:3,2])):
                        opposite_sign_map[video_dir][img_id][obj_id].append("2R")
                        wrong = True
                    if(check_opposite_signs(cam_t,four_cross_array[:3,3])):
                        opposite_sign_map[video_dir][img_id][obj_id].append("T")
                        wrong = True

                opp_sign_count += wrong

    return opposite_sign_map,opp_sign_count/total_count

# Example usage:
yaml_file = "ycbv_res.yml"
opposite_sign_map, wrong_sign_prob = extract_4x4_arrays_from_yaml(yaml_file)
print(wrong_sign_prob)
# print(opposite_sign_map)
json_file_path = "opposite_map_all.json"

# Export the map to JSON file
with open(json_file_path, 'w') as json_file:
    json.dump(opposite_sign_map, json_file, indent=4)

And this generate this output. opposite_map_all.json

Format of json file

{
    "video_dir": {
        "image_id": {
            "object_id": [
                "0R means opposite signs in 1st column of rotation matrix"
                "1R means opposite signs in 2nd column of rotation matrix"
                "2R means opposite signs in 3rd column of rotation matrix"
                "T means opposite signs in translation matrix"
            ]
        }
    }
}
wenbowen123 commented 3 weeks ago

thanks, I will check after finishing a deadline. For those abnormal scenes, can you select one and ONLY run on it and check the viz? You can increase the debug level >=3 for more verbose logging.