NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.01k stars 283 forks source link

Regarding custom data generation #325

Open ArghyaChatterjee opened 11 months ago

ArghyaChatterjee commented 11 months ago

Hello,

I was trying to generate dataset for centerpose using your (dope) pipeline. There are 4 problems that I am facing.

  1. I have taken some images at different exposures to merge and generate some hdr images which represents my lab's background. Now, my hdr images are 1920 x 1080 in resolution. The images that I am generating using your pipeline is 1280 x 720 in resolution. Now when I try to generate a training dataset with the main image and the distractors, I can see the images taken and the corresponding annotated dataset were generated but the background images are zoomed in (which is not representative of the original background). I have tried to change the position of the camera and the fov, but that results into distorting the image itself. How can I change this thing ?

Normal (Without changing anything in your script, auto zoomed in background which is a problem): 00000

Camera Eye changed (from 'eye':visii.vec3(0,0,0) to 'eye':visii.vec3(0,0,-2), looks distorted):

random_camera_movement = {
    'at':visii.vec3(1,0,0),
    'up':visii.vec3(0,0,1),
    'eye':visii.vec3(0,0,-2)
}

Screenshot from 2023-10-23 15-48-04

Camera fov changed to 2 (from default 0.78 to 2, looks distorted):

    camera = visii.entity.create(
    name = "camera",[00000](https://github.com/NVlabs/Deep_Object_Pose/assets/28845357/0aaef259-8281-4379-8d5b-d2fce87e9eb9)
    transform = visii.transform.create("camera"),
    camera = visii.camera.create_perspective_from_fov(
    name = "camera",
    field_of_view = 1.5,
    aspect = float(opt.width)/float(opt.height)
    )

00000

  1. In the original objectron dataset that centerpose is trained on, it contains keypoints 3d and scale of object in the corresponding annotated json file. As dope doesn't need that information, you haven't included that inside the nvisii interface. Can you tell me how to generate the information for centerpose dataset ?? Here is how the json file looks like for dope:

    {
    "camera_data": {
        "camera_look_at": {
            "at": [
                1.0,
                0.0,
                0.0
            ],
            "eye": [
                0.0,
                0.0,
                0.0
            ],
            "up": [
                0.0,
                0.0,
                1.0
            ]
        },
        "camera_view_matrix": [
            [
                0.0,
                0.0,
                1.0,
                0.0
            ],
            [
                -1.0,
                0.0,
                0.0,
                0.0
            ],
            [
                0.0,
                -1.0,
                0.0,
                0.0
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ],
        "height": 1920,
        "intrinsics": {
            "cx": 640.0,
            "cy": 960.0,
            "fx": 2317.6455078125,
            "fy": 2317.6455078125
        },
        "location_worldframe": [
            -0.0,
            0.0,
            -0.0
        ],
        "quaternion_xyzw_worldframe": [
            -0.5,
            0.5,
            -0.5,
            0.5
        ],
        "width": 1280
    },
    "objects": [
        {
            "bounding_box_minx_maxx_miny_maxy": [
                764,
                1029,
                493,
                725
            ],
            "class": "Sony_Acid_Music_Studio",
            "local_cuboid": null,
            "local_to_world_matrix": [
                [
                    0.3346782624721527,
                    -0.293707937002182,
                    -0.8953915238380432,
                    -0.0
                ],
                [
                    0.9418398141860962,
                    0.13497743010520935,
                    0.30776405334472656,
                    -0.0
                ],
                [
                    0.030464906245470047,
                    -0.9463173747062683,
                    0.3217999041080475,
                    -0.0
                ],
                [
                    1.9207875728607178,
                    -0.12306323647499084,
                    0.2600710093975067,
                    1.0
                ]
            ],
            "location": [
                0.12306323647499084,
                -0.2600710093975067,
                1.9207875728607178
            ],
            "location_worldframe": [
                1.9207875728607178,
                -0.12306323647499084,
                0.2600710093975067
            ],
            "name": "google_Sony_Acid_Music_Studio_0",
            "projected_cuboid": [
                [
                    1038.0235290527344,
                    655.3387069702148
                ],
                [
                    996.9047546386719,
                    491.82838439941406
                ],
                [
                    984.8480224609375,
                    486.7687225341797
                ],
                [
                    1025.5551147460938,
                    647.6087951660156
                ],
                [
                    815.9352874755859,
                    730.3356170654297
                ],
                [
                    769.0859222412109,
                    568.1963539123535
                ],
                [
                    760.9901428222656,
                    561.7803955078125
                ],
                [
                    807.2328186035156,
                    721.2976455688477
                ],
                [
                    900.2195739746094,
                    608.800220489502
                ]
            ],
            "provenance": "nvisii",
            "px_count_all": 0,
            "px_count_visib": 0,
            "quaternion_xyzw": [
                0.6266990900039673,
                0.30334094166755676,
                0.5110090374946594,
                0.5040854811668396
            ],
            "quaternion_xyzw_worldframe": [
                0.46848180890083313,
                0.34586817026138306,
                -0.4615582525730133,
                0.669226348400116
            ],
            "segmentation_id": 1,
            "visibility": 1
        },
        {
            "bounding_box_minx_maxx_miny_maxy": [
                257,
                461,
                837,
                1052
            ],
            "class": "Epson_DURABrite_Ultra_786_Black_Ink_Cartridge_T786120S",
            "local_cuboid": null,
            "local_to_world_matrix": [
                [
                    -0.36425772309303284,
                    0.1863223910331726,
                    -0.9124693870544434,
                    0.0
                ],
                [
                    -0.6184156537055969,
                    -0.7809761166572571,
                    0.08739950507879257,
                    0.0
                ],
                [
                    -0.6963321566581726,
                    0.5961212515830994,
                    0.39970117807388306,
                    -0.0
                ],
                [
                    1.6742818355560303,
                    0.14743097126483917,
                    -0.02730831876397133,
                    1.0
                ]
            ],
            "location": [
                -0.14743097126483917,
                0.02730831876397133,
                1.6742818355560303
            ],
            "location_worldframe": [
                1.6742818355560303,
                0.14743097126483917,
                -0.02730831876397133
            ],
            "name": "google_Epson_DURABrite_Ultra_786_Black_Ink_Cartridge_T786120S_2",
            "projected_cuboid": [
                [
                    222.03956604003906,
                    956.2545776367188
                ],
                [
                    255.54000854492188,
                    835.0365829467773
                ],
                [
                    291.5988540649414,
                    828.6186218261719
                ],
                [
                    258.3934783935547,
                    951.4541816711426
                ],
                [
                    404.1435241699219,
                    1057.3942565917969
                ],
                [
                    431.6315460205078,
                    943.2322311401367
                ],
                [
                    467.2838592529297,
                    938.7166213989258
                ],
                [
                    440.1404571533203,
                    1054.3155670166016
                ],
                [
                    349.8906707763672,
                    947.1115493774414
                ]
            ],
            "provenance": "nvisii",
            "px_count_all": 0,
            "px_count_visib": 0,
            "quaternion_xyzw": [
                -0.6319432854652405,
                -0.6699357032775879,
                0.37993085384368896,
                0.08652451634407043
            ],
            "quaternion_xyzw_worldframe": [
                -0.5042363405227661,
                0.21423150599002838,
                0.7976426482200623,
                0.2522238790988922
            ],
            "segmentation_id": 3,
            "visibility": 1
        },
        {
            "bounding_box_minx_maxx_miny_maxy": [
                754,
                1237,
                1787,
                1920
            ],
            "class": "STACKING_BEAR_V04KKgGBn2A",
            "local_cuboid": null,
            "local_to_world_matrix": [
                [
                    -0.1307300329208374,
                    0.9679588079452515,
                    0.21439549326896667,
                    -0.0
                ],
                [
                    0.49649521708488464,
                    -0.12326063215732574,
                    0.8592435121536255,
                    -0.0
                ],
                [
                    0.8581387996673584,
                    0.21877521276474,
                    -0.46447306871414185,
                    -0.0
                ],
                [
                    1.031076431274414,
                    -0.173092320561409,
                    -0.47199130058288574,
                    1.0
                ]
            ],
            "location": [
                0.173092320561409,
                0.47199130058288574,
                1.031076431274414
            ],
            "location_worldframe": [
                1.031076431274414,
                -0.173092320561409,
                -0.47199130058288574
            ],
            "name": "google_STACKING_BEAR_V04KKgGBn2A_3",
            "projected_cuboid": [
                [
                    694.1983795166016,
                    2238.947296142578
                ],
                [
                    1155.0393676757812,
                    2306.8888092041016
                ],
                [
                    1160.1544189453125,
                    1852.720069885254
                ],
                [
                    739.4944000244141,
                    1779.2024230957031
                ],
                [
                    747.8303527832031,
                    2240.9852600097656
                ],
                [
                    1244.3098449707031,
                    2314.351272583008
                ],
                [
                    1241.6778564453125,
                    1826.1781311035156
                ],
                [
                    791.5280151367188,
                    1746.397590637207
                ],
                [
                    974.1064453125,
                    2027.9118347167969
                ]
            ],
            "provenance": "nvisii",
            "px_count_all": 0,
            "px_count_visib": 0,
            "quaternion_xyzw": [
                -0.09103001654148102,
                0.25028812885284424,
                0.9598621129989624,
                -0.0879439264535904
            ],
            "quaternion_xyzw_worldframe": [
                0.603532075881958,
                0.6066181659698486,
                0.444273978471756,
                0.26530003547668457
            ],
            "segmentation_id": 4,
            "visibility": 1
        },
        {
            "bounding_box_minx_maxx_miny_maxy": [
                889,
                1106,
                132,
                444
            ],
            "class": "Nestle_Carnation_Cinnamon_Coffeecake_Kit_1913OZ",
            "local_cuboid": null,
            "local_to_world_matrix": [
                [
                    -0.30763235688209534,
                    -0.11167889088392258,
                    -0.9449289441108704,
                    0.0
                ],
                [
                    -0.3483346402645111,
                    0.9373666048049927,
                    0.0026190669741481543,
                    0.0
                ],
                [
                    0.8854524493217468,
                    0.32995718717575073,
                    -0.3272658884525299,
                    -0.0
                ],
                [
                    1.718860387802124,
                    -0.30652472376823425,
                    0.5503286123275757,
                    1.0
                ]
            ],
            "location": [
                0.30652472376823425,
                -0.5503286123275757,
                1.718860387802124
            ],
            "location_worldframe": [
                1.718860387802124,
                -0.30652472376823425,
                0.5503286123275757
            ],
            "name": "google_Nestle_Carnation_Cinnamon_Coffeecake_Kit_1913OZ_4",
            "projected_cuboid": [
                [
                    989.7109985351562,
                    447.8106880187988
                ],
                [
                    961.2370300292969,
                    289.87009048461914
                ],
                [
                    884.0269470214844,
                    280.8959770202637
                ],
                [
                    910.965576171875,
                    440.72656631469727
                ],
                [
                    1112.410888671875,
                    309.3926811218262
                ],
                [
                    1077.9110717773438,
                    139.5348358154297
                ],
                [
                    994.6639251708984,
                    127.50640869140625
                ],
                [
                    1027.3947143554688,
                    299.54017639160156
                ],
                [
                    992.0835876464844,
                    294.37883377075195
                ]
            ],
            "provenance": "nvisii",
            "px_count_all": 0,
            "px_count_visib": 0,
            "quaternion_xyzw": [
                -0.23918910324573517,
                -0.007904015481472015,
                0.6664068698883057,
                0.7061359882354736
            ],
            "quaternion_xyzw_worldframe": [
                -0.14341111481189728,
                0.8019140362739563,
                0.1036820113658905,
                0.5706289410591125
            ],
            "segmentation_id": 5,
            "visibility": 1
        },
        {
            "bounding_box_minx_maxx_miny_maxy": [
                804,
                1047,
                605,
                845
            ],
            "class": "mug",
            "local_cuboid": null,
            "local_to_world_matrix": [
                [
                    0.6584553718566895,
                    -0.3450233042240143,
                    -0.6688762903213501,
                    -0.0
                ],
                [
                    0.7392677068710327,
                    0.12983770668506622,
                    0.6607765555381775,
                    0.0
                ],
                [
                    -0.14113791286945343,
                    -0.9295704364776611,
                    0.34055688977241516,
                    -0.0
                ],
                [
                    1.2288596630096436,
                    -0.1444949060678482,
                    0.12551634013652802,
                    1.0
                ]
            ],
            "location": [
                0.1444949060678482,
                -0.12551634013652802,
                1.2288596630096436
            ],
            "location_worldframe": [
                1.2288596630096436,
                -0.1444949060678482,
                0.12551634013652802
            ],
            "name": "mug_0",
            "projected_cuboid": [
                [
                    1093.7495422363281,
                    788.3962440490723
                ],
                [
                    1054.3683624267578,
                    657.6246643066406
                ],
                [
                    1005.3102111816406,
                    551.9614219665527
                ],
                [
                    1044.7257995605469,
                    680.1902961730957
                ],
                [
                    867.2451019287109,
                    871.4556884765625
                ],
                [
                    816.9114685058594,
                    746.8995094299316
                ],
                [
                    782.5321960449219,
                    637.4864959716797
                ],
                [
                    831.6124725341797,
                    760.0553512573242
                ],
                [
                    936.1599731445312,
                    711.9807815551758
                ]
            ],
            "provenance": "nvisii",
            "px_count_all": 0,
            "px_count_visib": 0,
            "quaternion_xyzw": [
                0.7326215505599976,
                0.18394167721271515,
                0.5418983697891235,
                0.3684796094894409
            ],
            "quaternion_xyzw_worldframe": [
                0.5449910163879395,
                0.18084901571273804,
                -0.3715722858905792,
                0.7295289635658264
            ],
            "segmentation_id": 10,
            "visibility": 1
        }
    ]
    }

    Here is how the json file looks like for centerpose:

    {
    "AR_data": {
        "plane_center": [
            0.026276886463165283,
            0.03733876347541809,
            -0.42468586564064026
        ],
        "plane_normal": [
            -0.7663699388504028,
            0.09618763625621796,
            0.6351574063301086
        ]
    },
    "camera_data": {
        "camera_projection_matrix": [
            [
                1.6554118394851685,
                0.0,
                0.019000232219696045,
                0.0
            ],
            [
                0.0,
                2.2072157859802246,
                -0.004737734794616699,
                0.0
            ],
            [
                0.0,
                0.0,
                -0.9999997615814209,
                -0.0009999998146668077
            ],
            [
                0.0,
                0.0,
                -1.0,
                0.0
            ]
        ],
        "camera_view_matrix": [
            [
                -0.26714298129081726,
                -0.7525513172149658,
                -0.6019145250320435,
                -0.055233731865882874
            ],
            [
                -0.9542973041534424,
                0.11975152790546417,
                0.27381736040115356,
                0.18261873722076416
            ],
            [
                -0.13398146629333496,
                0.6475539207458496,
                -0.7501484751701355,
                -0.00018225116946268827
            ],
            [
                0.0,
                0.0,
                0.0,
                1.0
            ]
        ],
        "height": 800,
        "intrinsics": {
            "cx": 298.370361328125,
            "cy": 392.1915690104167,
            "fx": 662.1647135416667,
            "fy": 662.1647135416667
        },
        "location_world": [
            0.15949289500713348,
            -0.06331707537174225,
            -0.08338691294193268
        ],
        "quaternion_world_xyzw": [
            -0.583792361067781,
            0.7309314302392926,
            0.3151360368846342,
            0.1600468734586658
        ],
        "width": 600
    },
    "objects": [
        {
            "class": "cup",
            "keypoints_3d": [
                [
                    -0.027369018644094467,
                    0.04407183825969696,
                    -0.38022491335868835
                ],
                [
                    -0.0025680139660835266,
                    -0.032148003578186035,
                    -0.44896677136421204
                ],
                [
                    0.05524785816669464,
                    -0.021686285734176636,
                    -0.38079142570495605
                ],
                [
                    -0.10985984653234482,
                    -0.01868179440498352,
                    -0.3600447177886963
                ],
                [
                    -0.05204399302601814,
                    -0.008220136165618896,
                    -0.2918694317340851
                ],
                [
                    -0.002694040536880493,
                    0.09636379778385162,
                    -0.4685803949832916
                ],
                [
                    0.05512183904647827,
                    0.10682550072669983,
                    -0.40040507912635803
                ],
                [
                    -0.10998588055372238,
                    0.10982996970415115,
                    -0.37965837121009827
                ],
                [
                    -0.05217001214623451,
                    0.12029168009757996,
                    -0.3114830255508423
                ]
            ],
            "location": [
                -0.02736901868021091,
                0.044071842568774056,
                -0.38022490531119946
            ],
            "mug": true,
            "mug_handle_visible": true,
            "name": "cup_0",
            "projected_cuboid": [
                [
                    378,
                    344
                ],
                [
                    253,
                    388
                ],
                [
                    263,
                    488
                ],
                [
                    266,
                    189
                ],
                [
                    282,
                    273
                ],
                [
                    437,
                    388
                ],
                [
                    478,
                    483
                ],
                [
                    493,
                    200
                ],
                [
                    557,
                    281
                ]
            ],
            "provenance": "objectron",
            "quaternion_xyzw": [
                0.19061728062246633,
                0.29139854153287625,
                0.6446484091470697,
                0.6805735602451516
            ],
            "scale": [
                0.12999999523162842,
                0.14000000059604645,
                0.09000000357627869
            ]
        }
    ]
    }
  2. When it's generating the annotated dataset, why can't I see the segmented.exr images ( I mean it's blank white). How to get that ?? I can only see a depth.exr which is a 32 bit binary image (which looked more like a segmentation image than a depth image). Here are the things that the pipeline is generating with the depth.exr as a binary image and seg.exr as a blank white image.

Screenshot from 2023-10-23 16-19-17

The depth image (.exr) looks like this:

Screenshot from 2023-10-23 16-21-59

The segmentation image (.exr) looks like this:

Screenshot from 2023-10-23 16-22-14

  1. Is there a way to interactively change the parameters inside nvisii so that I can adjust parameters visually, I mean where to put the objects, distractors and camera itself ??
ArghyaChatterjee commented 11 months ago

@TontonTremblay, can you respond please ?

TontonTremblay commented 11 months ago

I am in vacation this week, will answer next week.

TontonTremblay commented 10 months ago

Wow this is some unknown territory. I am impressed by what you are trying to do, and I have always wanted to try something along these lines. Is this the tutorial you followed: https://blog.polyhaven.com/how-to-create-high-quality-hdri/ nvisii uses this model of hdr map, I am afraid I cannot help much more than that tbh. But please share your results here.

TontonTremblay commented 10 months ago

I see there are questions in there :P

  1. See above, I do not think you created your hdr map correctly.
  2. Yeah you are correct, you need the cuboid sizes, https://github.com/NVlabs/Deep_Object_Pose/blob/master/scripts/nvisii_data_gen/utils.py#L950-L960 this is the information you need and you need to then normalize it by one axis.
  3. exr images are a format that is not bounded by values, you can store what ever you want in them, then most viewer wont know how to read the segmentation. https://chat.openai.com/share/dbfb8ef1-6504-48e6-bc2b-9972f18283f0 try something like this.
  4. https://github.com/owl-project/NVISII/blob/master/examples/17.materials_visii_interactive.py try to make the script work with this and then you could control your params live.

Good luck