Ademord commented 4 years ago

I am trying to train on some chairs, for which I setup 4 different photorrealistic scenes in UE4.

My dataset is made of 20k images per location in the scene, being 20k x 5 x 4 = 400k images.

A sample of what my dataset looks like:

Right now the export size is 512x512. Should I be increasing the size of the pictures the Network takes in? I have an OrbitalMovement in the camera that goes from 200 to 500, so I get some really close up photos and some distant ones, as shown here:
Examples of the pictures that ROS is seeing are:

Some questions that are cluttering my mind:

Do you think maybe I trained unnecessarily in too many images? 400k?
Also I did not export depth, maybe that's a mistake ? I have trained for 20-30 epochs and my loss becomes 5 x 10 ^ -7.
Some pictures of the chair in ROS have a very dark background as in the 3rd photo, so I am worried this is just blending in ?
Should I add a RandomizeMaterialColor to make the network more robust to the colors of the chair, or is the black and metal combination good enough ?
Do you think that my distance variation is too big? (some pictures are too far and some are too close?)
Am I taking too many angles from the object?

Additional Side info:
I don't think I have to modify something in the camera.config.. ? The camera I have at the office is a Realsense with the following info posted to camera_info, which is correctly set at config.yaml.
Camera_info.yaml has remained untouched for me the past 3 months during my tests (cracker box, banana, and now chairs):

# rostopic echo /camera/color/camera_info
---
header: 
  seq: 11576
  stamp: 
    secs: 1597168869
    nsecs: 431096885
  frame_id: "camera_color_optical_frame"
height: 480
width: 640
distortion_model: "plumb_bob"
D: [0.0, 0.0, 0.0, 0.0, 0.0]
K: [617.3424072265625, 0.0, 325.4233093261719, 0.0, 617.2723999023438, 230.31985473632812, 0.0, 0.0, 1.0]
R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
P: [617.3424072265625, 0.0, 325.4233093261719, 0.0, 0.0, 617.2723999023438, 230.31985473632812, 0.0, 0.0, 0.0, 1.0, 0.0]
binning_x: 0
binning_y: 0
roi: 
  x_offset: 0
  y_offset: 0
  height: 0
  width: 0
  do_rectify: False
---

# camera_info.yaml
image_width: 640
image_height: 480
camera_name: dope_webcam_0
camera_matrix:
  rows: 3
  cols: 3
  data: [641.5, 0, 320.0, 0, 641.5, 240.0, 0, 0, 1]
distortion_model: plumb_bob
distortion_coefficients:
  rows: 1
  cols: 5
  data: [0, 0, 0, 0, 0]
rectification_matrix:
  rows: 3
  cols: 3
  data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
projection_matrix:
  rows: 3
  cols: 4
  data: [641.5, 0, 320.0, 0, 0, 641.5, 240.0, 0, 0, 0, 1, 0]

My config_pose.yaml is also correct, as per the dimensions shown in _object_settings.json


# config_pose.yaml
topic_camera: "/camera/color/image_raw"
topic_camera_info: "/camera/color/camera_info"
topic_publishing: "dope"
input_is_rectified: True   # Whether the input image is rectified (strongly suggested!)
downscale_height: 500      # if the input image is larger than this, scale it down to this pixel height

Comment any of these lines to prevent detection / pose estimation of that object

weights: {

"cracker":"package://dope/weights/cracker_60.pth",

# "gelatin":"package://dope/weights/gelatin_60.pth",
# "meat":"package://dope/weights/meat_20.pth",
# "mustard":"package://dope/weights/mustard_60.pth",
# "soup":"package://dope/weights/soup_60.pth",
#"sugar":"package://dope/weights/sugar_60.pth",
# "bleach":"package://dope/weights/bleach_28_dr.pth"
# "banana":"package://dope/weights/banana_39.pth"
 "chair":"package://dope/weights/chair.pth"

}

Cuboid dimension in cm x,y,z

dimensions: { "cracker": [16.403600692749023,21.343700408935547,7.179999828338623], "gelatin": [8.918299674987793, 7.311500072479248, 2.9983000755310059], "meat": [10.164673805236816,8.3542995452880859,5.7600898742675781], "mustard": [9.6024150848388672,19.130100250244141,5.824894905090332], "soup": [6.7659378051757813,10.185500144958496,6.771425724029541], "sugar": [9.267730712890625,17.625339508056641,4.5134143829345703], "bleach": [10.267730712890625,26.625339508056641,7.5134143829345703], "banana": [19.717399597167969, 3.8649001121520996, 7.4065999984741211], "chair": [64.41, 97.75, 65.13], "teat": [605.098388671875, 362.48529052734375, 337.88888549804688], }

class_ids: { "cracker": 1, "gelatin": 2, "meat": 3, "mustard": 4, "soup": 5, "sugar": 6, "banana": 7, "chair": 8, "teat": 9, }

draw_colors: { "cracker": [13, 255, 128], # green "gelatin": [255, 255, 255], # while "meat": [0, 104, 255], # blue "mustard": [217,12, 232], # magenta "soup": [255, 101, 0], # orange "sugar": [232, 222, 12], # yellow "banana": [232, 222, 12], # yellow "chair": [232, 222, 12], # yellow "teat": [232, 222, 12], # yellow }

optional: provide a transform that is applied to the pose returned by DOPE

model_transforms: {

"cracker": [[ 0, 0, 1, 0],

[ 0, -1, 0, 0],

[ 1, 0, 0, 0],

[ 0, 0, 0, 1]]

}

optional: if you provide a mesh of the object here, a mesh marker will be

published for visualization in RViz

You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb

meshes: {

"cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",

"gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",

"meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",

"mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",

"soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",

"sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",

"bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",

}

optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.

mesh_scales: { "cracker": 0.01, "gelatin": 0.01, "meat": 0.01, "mustard": 0.01, "soup": 0.01, "sugar": 0.01, "banana": 0.01, "chair": 0.01, "teat": 0.01, }

Config params for DOPE

thresh_angle: 0.5 thresh_map: 0.01 sigma: 3 thresh_points: 0.1


<img src="https://user-images.githubusercontent.com/4501464/89893556-db7c5980-dbd8-11ea-8353-4aa6bc2c5085.png" height="240">

SceneCapturer info:
<img src="https://user-images.githubusercontent.com/4501464/89894082-cbb14500-dbd9-11ea-936f-4b931c97bf0e.png" height="420">

Thanks for taking the time to help me 
Friendly regards,

mintar commented 4 years ago

The camera info published by the realsense camera looks correct.

However, the dimensions are wrong. The config_pose.yaml file expects centimeters, and you chair is definitely not 2147 x 3258 x 2171 centimeters big. Maybe multiply it with the 0.03 value from the fixed_model_transform?

By the way: Don't take photos from your mobile phone if possible.

If it's text, like in the terminal, it's better to copy-paste that text. You can surround it with triple backticks to format it nicely (google "markdown code block").
If it's an image topic, you can run rosrun image_view image_saver image:=<your image topic> to directly save images.

Ademord commented 4 years ago

I have updated my previous post with the images exported as you suggested, the phone pictures into direct text and the cuboid_dimensions multiplied by .03, but it is still shows detected_objects=[]. What would be the next debugging step?

The size of the images ROS takes is 640x480 as per the exports by rosrun image_view

TontonTremblay commented 4 years ago

Can you look at the belief maps?

On Tue, Aug 11, 2020 at 11:16 AM Franco Ribera notifications@github.com wrote:

I have updated my previous post with the images exported as you suggested, the phone pictures into direct text and the cuboid_dimensions multiplied by .03, but it is still shows detected_objects=[]. What would be the next debugging step?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NVlabs/Deep_Object_Pose/issues/137#issuecomment-672155059, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABK6JIAUIP23UEKBDB7NDXDSAGDJTANCNFSM4P26GEXQ .

Ademord commented 4 years ago

I am not sure how to do this. After researching for a bit it seems that I should I add this function to the detection.py, correct?

def ShowBelief(tensor,name='tmp'):
    for j in range(tensor.size()[0]):

        belief = tensor[j].clone()
        belief -= float(torch.min(belief)[0].data.cpu().numpy())
        belief /= float(torch.max(belief)[0].data.cpu().numpy())  

        belief = m(belief.unsqueeze(0).unsqueeze(0)).squeeze().squeeze().data 
        belief = torch.clamp(belief,0,1)  
        belief = torch.cat([belief.unsqueeze(0),belief.unsqueeze(0),belief.unsqueeze(0)]).unsqueeze(0)

        temp = Variable(belief.clone())
        cv2.imwrite('belief_{}_v{}.png'.format(str(name),str(j)),temp.data.squeeze().cpu().numpy().transpose(1,2,0)*255)

TontonTremblay commented 4 years ago

I am working on a new version of DOPE and it includes this debug.

Ademord commented 4 years ago

I upgraded my dope to UE 4.24 and that's what I've been working with. Haven't finalized upgrading to 4.25...

Related to the thread, I will export tomorrow first thing in the morning my dataset again with multi-colored materials and better camera angles and start training. While it does that I will add this function to the code so I can see the maps you are looking for... I am scared this still won't work and I will be lost in the forest.

mintar commented 4 years ago

It's probably best to first show the belief maps before retraining. The belief maps are useful because there's a two-step process during inference:

network generates belief maps
solvePnP calculates object pose from belief maps

Visualizing the belief maps helps narrow down the problem. If the belief maps look crappy, you know that you need to retrain the network. If the belief maps look ok, you know that the problem is elsewhere (camera_info, wrong bounding box dimensions, ...). I suspect your "dimensions" are still not correct.

If you're going to export your dataset again, please make double sure that the scaling and bounding box center of your model is correct. I've always found it easiest to use meters (or centimeters) for the mesh and to move the mesh origin to the center of the bounding box.

In meshlab, you can visualize this via "render -> show box corners". For example: meshlab_screenshot

Here's the corresponding _object_settings.json:

{
    "exported_object_classes": [
        "getriebelager"
    ],
    "exported_objects": [
        {
            "class": "getriebelager",
            "segmentation_class_id": 155,
            "segmentation_instance_id": 13841190,
            "fixed_model_transform": [
                [ 0, 0, 100, 0 ],
                [ 0, -100, 0, 0 ],
                [ 100, 0, 0, 0 ],
                [ 0, 0, 0, 1 ]
            ],
            "cuboid_dimensions": [ 10.569199562072754, 26.933900833129883, 14.880499839782715 ]
        }
    ]
}

And here's my config_pose.yaml (abbreviated):

topic_camera: "/ensenso/rgb/image_rect_color"
topic_camera_info: "/ensenso/rgb/camera_info"
topic_publishing: "dope"
input_is_rectified: True
downscale_height: 400

weights: {
    "getriebelager":"package://hybrit_dope/weights/getriebelager_68.pth"
}

# Cuboid dimension in cm x,y,z
dimensions: {
    "getriebelager": [10.569199562072754, 26.933900833129883, 14.880499839782715]
}

class_ids: {
    "getriebelager": 1
}

draw_colors: {
    "getriebelager": [228, 26 , 28]
}

# optional: provide a transform that is applied to the pose returned by DOPE
model_transforms: {
    "getriebelager":            [[ 0,  0,  1,  0],
                                 [ 0, -1,  0,  0],
                                 [ 1,  0,  0,  0],
                                 [ 0,  0,  0,  1]],
}

# optional: if you provide a mesh of the object here, a mesh marker will be published for visualization in RViz
meshes: {
  "getriebelager":            "package://hybrit_objects/meshes/getriebelager/getriebelager.stl",
}

# Config params for DOPE
thresh_angle: 0.5
thresh_map: 0.01
sigma: 3
thresh_points: 0.1

# Depth image based post processing
# uncomment topic_depth to enable post processing
topic_depth: "/ensenso/depth/image_rect"
# maximum depth distance (in meters) between recognized pose and true pose in depth image
depth_tolerance: 0.30
# Minimum ratio between valid points (= not NaN, within depth_tolerance) in actual depth image and expected points in
# in rendered depth image. Detections below min_valid_points_ratio will be discarded. In practice, false positives
# almost always have 0 valid points, whereas true positives range between 0.02 and 0.70 (depending on object and
# camera).
min_valid_points_ratio: 0.01

You can see that:

the mesh origin is at the bounding box center: "Center" in meshlab shows 0, 0, 0 and the last line of fixed_model_transform is [ 0, 0, 0, 1 ]
my mesh is in meters, so that's where the scaling factor "100" in fixed_model_transform comes from
my x, y, z axes are switched around, that's why the 3x3 rotation matrix in fixed_model_transform looks like this (modulo the scaling factor):
```
[ 0,  0, 1 ]
[ 0, -1, 0 ]
[ 1,  0, 0 ]
```
That's also why I need the model_transforms in my config_pose.yaml. If the axes weren't switched around, the matrix would be the identity matrix, and I wouldn't need to specify anything in model_transforms. But please ignore model_transforms for now, it only affects the rotation of the output; you can play around with it once you recognize objects.
If you take the last two points together, this also explains the following discrepancy (scaling factor 100, x and z are switched):
- Meshlab "Size": 0.1488, 0.2693, 0.1057
- config_pose.yaml dimensions (rounded): 10.57, 26.93, 14.88 Basically, if you take the mesh bbox size as a vector and multiply it with fixed_model_transform, you get the dimensions.

Ademord commented 4 years ago

Thanks a lot for this reply! I clearly had some export discrepancies. I installed meshlab and looked at my old export, it was way off since it included a plane, which messed up everything. I modified the mesh in blender and exported it properly.

The way it was:

I had to modify the origin in blender to something else than 0 0 0 to get 0 0 0 in Meshlab interestingly (struggled 40min), even though my origin of the object is in the world origin.

Now it looks like this:

mintar commented 4 years ago

Now that looks much better! Did you re-measure if the dimensions fit your actual chair? Just to be sure.

BTW, you can also center the object directly in Meshlab by using Filters -> Normals, Curvature and Orientation -> Transform: Translate, Center, Set Origin -> Transformation: Center on BBox.

Ademord commented 4 years ago

Thanks! I actually measured the chairs in the office this time to make a realistic object, not just "some 3D model", hehe! Yes I found that out as well how to recenter on Meshlab but they dont export easily somewhy into Unreal, since they dont support FBX and Unreal didnt want to take an STL or a PLY, and when it did, it didnt include the UVs (which by the way gave me a lot of problems back then). So I just fixed it from Blender 👍

I am replacing the actors' meshes with the new chairs and then will reexport + retrain

Ademord commented 4 years ago

I have been training now for a day, it's at Train Epoch: 13 [138000/160000 (86%)] Loss: 0.000000180193666 I tried to test it on the chairs in the office with no results. I will make a rosbag out of the NDDS_Capture (video on a 360) and feed it to ros to see how DOPE reacts. I will do the feature maps visualization tomorrow too, to see how it goes...

Crossing fingers so that by epoch 60 we see better results...

My current question: is 13 epochs and such a low loss a good/bad sign, and 13 epochs too soon to test on ROS? Anyway, I will come back tomorrow with the feature maps, and will also attach some samples of the dataset as it looks now.

mintar commented 4 years ago

My current question: is 13 epochs and such a low loss a good/bad sign

Bad sign.

, and 13 epochs too soon to test on ROS?

Not too soon.

Looking forward to the belief maps!

Ademord commented 4 years ago

Hello guys I have finally gotten the belief maps! It looks terrible and I don't know how to debug this further...

What the camera sees: left0000 The Belief maps generated:

I don't know how to proceed from now on... I thought my dataset was well made.

mintar commented 4 years ago

Could you please upload a small sample of your dataset (like 100 random images, your mesh + all the relevant yaml files) somewhere?

Ademord commented 4 years ago

Hope this works: https://www.dropbox.com/sh/xmv2isxjp2kfn46/AAAZv_N5N_xnnZuf3uaEaITDa?dl=0

mintar commented 4 years ago

Thanks! Could you also upload the chair STL model?

Ademord commented 4 years ago

Done! I will upload as well my temple scene.

mintar commented 4 years ago

just the stl model of the chair would be enough

Ademord commented 4 years ago

It's all there 👍

mintar commented 4 years ago

Something is still seriously wrong with your fixed_model_transform. Run nvdu_viz from here and press F5, you should see the yellow mesh toggle on and off (you'll have to convert your mesh into .obj (e.g., using MeshLab) and put it into the proper path; if it's not there you'll get an error message in the terminal that shows you where to put it.

With your fixed_model_transform in , nothing happens:

            "fixed_model_transform": [
                [ -0.65170001983642578, 0, -0.75840002298355103, 0 ],
                [ 0.75840002298355103, 0, -0.65170001983642578, 0 ],
                [ 0, -1, 0, 0 ],
                [ 985.652587890625, -71.237998962402344, 2792.017578125, 1 ]
            ],

I have fixed it:

            "fixed_model_transform": [
                [ 0, 0, 1, 0 ],
                [ -1, 0, 0, 0],
                [ 0, -1, 0, 0 ],
                [0, 0, 0, 1]
            ],

Now you can switch on and off the mesh:

no_mesh with_mesh

However, I'm not sure if that's a problem. After all, your other json files look ok, particularly the pose_transform and projected_cuboid values.

Maybe better delete the *.cs.png files from your dataset, I'm not sure if that confuses DOPE.

Also maybe try again without the color randomization.

Other than that, I'm a bit lost. Maybe the problem is that most projected corners do not lie on (or near) the object (see my screenshot above). @TontonTremblay : Do you think that's the problem?

Ademord commented 4 years ago

Ok so I have learned that the way to "fix" the fixed_model_transform is to check the Auto-start capturing option on the NDDS project in UE4. I used to hit Play, and then manually Start Capture. This modifies the initial transform because the object is rotated on the first capture moment, but when we do "Auto Start Capturing", the first one is the standing "correct one". .. (If I am correct.. )

With this setup the fixed model transform looks like this

"fixed_model_transform": [
                [ 0, 0, 1, 0 ],
                [ -1, 0, 0, 0 ],
                [ 0, -1, 0, 0 ],
                [ -2230, -50, -1950, 1 ]
            ],

Where the last row is for the coordinates of where in the world the object stands, I can manually modify that one I guess..? Since the network doesn't need to know the object is not in the "center" of the world/scene

I will retrain then with these new setup you indicate and without color randomization (maybe subtle randomization black-dark gray colors).

Edit: So the fixed_model_transform was still being affected because it didn't always grab the first frame (slow CPU?) where the Chair was still and the RandomRotation component kicked in and "messed everything up". I just removed the RandomRotation component to get the correct 1, -1, -1 transform. Will manually adjust the last row of the matrix.

Do you think the Segmentation Class Id should be the same across all json files?

Ademord commented 4 years ago

By epoch 3, the loss is already very low.. I will wait a bit more, and then generate the feature maps again.. Train Epoch: 3 [0/160000 (0%)] Loss: 0.000000076063948

Maybe given the point you made about

Other than that, I'm a bit lost. Maybe the problem is that most projected corners do not lie on (or near) the object (see my screenshot above)

I can try fixing it by training on the chair model without the recliner, so that it is at the level of the arm-rests, I will play around with this. I also will share a dropbox link with the new/current dataset sample and config again at the end of the day when I test.

mintar commented 4 years ago

I can try fixing it by training on the chair model without the recliner, so that it is at the level of the arm-rests, I will play around with this.

Maybe it's better if you only train on the black parts of the chair. The metal parts are very thin, so that's probably the bigger problem.

Ademord commented 4 years ago

I have been training and I think I found the error since the loss is looking a lot more normal (0.0025 is good?).

Will test today after 20 epochs and show u results + belief maps.

By the way do we have any kind of records of at which Hz speed does Dope work?

Ademord commented 4 years ago

So we have some progress this time, after 30 epochs! I got a detection!! apparently but now this error comes out..

Screenshot from 2020-08-24 23-19-58

These are my belief maps, they look good! (i think?)

Ademord commented 4 years ago

Okay so i tracked down the error my openCV version is 4.2 and the error is in line elif CuboidPNPSolver.cv2majorversion == 3: since there is no support in the cuboid detector.py for version 4.2 What would be an algorithm for this version to solve pnp?

I downgraded opencv to 3.4 and i got this: Screenshot from 2020-08-24 23-58-03

Screenshot from 2020-08-24 23-57-58

Do you think its because im at home and i am playing from a rosbag? I missed a couple values DKG from the camera_info :/ will try this in a couple of days. I need to finish another task before thursday.

Either way this is good news, im happy for all the patience and support, we will see how this ends up this week :)

mintar commented 4 years ago

I have been training and I think I found the error since the loss is looking a lot more normal (0.0025 is good?).

Yes, that looks good. Also your belief maps look perfect! How did you fix the problem?

Okay so i tracked down the error my openCV version is 4.2 and the error is in line elif CuboidPNPSolver.cv2majorversion == 3:

That should still work in OpenCV 4, so you could try changing that line to

elif CuboidPNPSolver.cv2majorversion >= 3:

I downgraded opencv to 3.4 and i got this: 'location': [nan, nan, nan]

That still looks like the solvePnP step failed for some reason. DOPE is tested with OpenCV 3.3.1. It's probably best if you just use the Docker image, at least for now.

Ademord commented 4 years ago

I did that but still gave me an error. I do think it's because my camera info is missing the D K R P values. I will re-export my rosbag on Thursday.

Related to how I made it work... I found out that I wasn't training on the correct class... That's why the loss was always giving me 0.000003 or something low. Because it was training on no object. Face-palm.

I falsely assumed from the documentation that the class name given in TagActor is something independent. I assumed that the folder name would be the class name given in the --object parameter. I don't know if it was obvious but I got mislead, and I feel like an idiot.

Before (assumed it was folder name) train.py --object 044_chair_80k

After (or just skip the whole parameter since I'm only feeding one object in the folders) train.py --object chair

I have been using the docker image to train (cluster), and testing the network directly on a separate laptop (host, not container) which has ROS, etc.

Ademord commented 4 years ago

So I have been able to log in to my machine (not in the office yet) and the network has trained nicely and always returns something :) here are the belief maps, i dont know why v4 and v5 always look a bit funny but the rest look good right? This is output from running in the rosbag, i will run with the camera in the office later today I hope

mintar commented 4 years ago

Yes, this looks good. Probably v4 and v5 are the two bbox corners that are "in the air" furthest away from the chair, so they are the hardest to predict. But I believe these results should be more than good enough for object detection.

Ademord commented 4 years ago

Hello! Good news! this is my current result Screenshot from 2020-09-05 15-20-44 I dont know how to make the mesh that overlays smaller, nor zoom out..... I couldnt take a screenshot where the chair mesh showing on the left, but it covers the entire camera window.

mintar commented 4 years ago

I dont know how to make the mesh that overlays smaller

That's easy: In your config_pose, provide the scaling factor to meters:

# optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.
mesh_scales: {
    "cracker": 0.01,
    "gelatin": 0.01,
    "meat":    0.01,
    "mustard": 0.01,
    "soup":    0.01,
    "sugar":   0.01,
    "bleach":  0.01,
}

Ademord commented 4 years ago

Good news! only thing now is that it is rotated, i will find the axis...

Screenshot from 2020-09-08 12-59-17

mintar commented 4 years ago

Congratulations! When you have found the transformation, you should put it into your model_transforms in the config_pose.yaml.

Do you mind uploading the trained weights and your mesh file somewhere? We have the same chairs in our office and I would love to try it out myself.

Ademord commented 4 years ago

Hello! Of course! I made a small video of the results, you can see how it performs and also with two chairs.

Here is the dropbox with the videos of my performance and the weights + config: https://www.dropbox.com/sh/dz5wybaqf2evzco/AABBMsfJXeLK46FUBzra5KzUa?dl=0

Let me know if you have any feedback. Is the visualization where it tries to fit the mesh with the real model always blinking like this?

It would be cool if there was some post-processing step. Do you know if can I enable the # Depth image based post processing?

Ademord commented 4 years ago

Also with two chairs (multiple objects), is it supposed to show multiple poses at the same time? since it only shows one at a time (second video).

Ademord commented 4 years ago

Additionally, sometimes the mask export doesn't work for me, some 3d models are properly exported with a white mask, and the others, for example the base models (this is a cylinder) from UE4, do not export a mask (as shown below, they export black and merge with background == no mask?)

Edit: The only way I got this to work 👍 and to export me a mask is to go back to blender and merge the "new" object with an "old" object, then export the .fbx and import to UE4, then the model has "some property i dont know about which old objects have and new ones dont" and the mask shows when exporting (but then the cylinder model inside UE4 should have it...). Made an issue to ask about it in the meantime at: https://github.com/NVIDIA/Dataset_Synthesizer/issues/86 but I am not very hopeful they will answer. Also, I am working with UE4 2.24.

TontonTremblay commented 4 years ago

@thangt Hey thang have you seen this problem before? Very cool results. This is very cool to see things come together like this :P

mintar commented 4 years ago

Very nice! Could you please also upload your chair.stl (or even better, a mesh with colors)?

Let me know if you have any feedback. Is the visualization where it tries to fit the mesh with the real model always blinking like this?

I fixed that in #120, but @TontonTremblay still has to merge that PR. (@TontonTremblay: could you please also merge #121 while you're at it?).

It would be cool if there was some post-processing step. Do you know if can I enable the # Depth image based post processing?

That's not merged into master yet, and I haven't opened a pull request since it's a rather large change and I don't know if @TontonTremblay wants it here. So for now, you'll have to use the "hybrit" branch of my fork:

https://github.com/mintar/Deep_Object_Pose/tree/hybrit

This will mainly correct the depth value of the objects and remove some false positives (like the huge chair around 0:30 of 20200908_205148.mp4):

Bildschirmfoto_2020-09-10_09-52-01

Also with two chairs (multiple objects), is it supposed to show multiple poses at the same time? since it only shows one at a time (second video).

Yes, it's supposed to detect both. You can see that it briefly detects both at around 0:58:

Bildschirmfoto_2020-09-10_09-51-20

DOPE can definitely work better than in your videos with tabletop objects, but I guess your objects are quite hard to detect by DOPE, so I'm happy it works as well as it does.

Ademord commented 4 years ago

Could you please also upload your chair.stl (or even better, a mesh with colors)?

I have added an STL and an FBX into the dropbox link so you get both just in case, should be colored but they never arrived colored into UE4 (materials import was quite an adventure).

you'll have to use the "hybrit" branch of my fork:

okay i will check this one out then, I guess it is a 4.22 version? I will have to update it to 4.24.

Also, it would be cool to be able to test the results after training without ROS per se, I saw a related post but I am trying to implement a solution where a docker-compose file creates a container that replays a rosbag, and then another one which feeds from that and processes the feed (say, dope or some other network).

mintar commented 4 years ago

okay i will check this one out then, I guess it is a 4.22 version? I will have to update it to 4.24.

It doesn't matter. The UE4 version is only relevant for generating the training data. Once you have the images, you can use any DOPE version you want with it.

Ademord commented 4 years ago

Ah, woops.

Ademord commented 4 years ago

Hello So I tested with the hybrit branch and these are my results, which look amazing! https://www.dropbox.com/sh/2jk8ye8pzp6zwiw/AABRY-kfDLMKRIeXAa_KT7vea?dl=0 Did u have a chance to test this on your side?

I measured the prediction time and it averages around 470-560 ms. I measured with a decorator at:

    @staticmethod
    @timing
    def detect_object_in_image(net_model, pnp_solver, in_img, config):

This is a scary question but, is there a way to speed it up ? I am wondering if a prediction time of 100ish ms is possible...

Edit: I might be testing on a CPU, i will double check. Python 2 timing:

def timing(f):
    def wrap(*args):
        time1 = time.time()
        ret = f(*args)
        time2 = time.time()
        print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)
        return ret
    return wrap

Python 3 timing:

def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap

Ademord commented 4 years ago

Also I see that there is an X11 mount on the run-with-docker, but I dont find how to VNC into the container? Should I install a vnc server myself etc?

TontonTremblay commented 4 years ago

re: https://github.com/NVlabs/Deep_Object_Pose/issues/137#issuecomment-690062416

I just merged the MR, I am sorry I did not do it ealier. Your problems with flickering should be solved now.

Ademord commented 4 years ago

I tried to mess around with NDDS and I am a bit confused about the GroupActorManager (GAM), and I can't find docs for it. I made a GAM for the TrainingGroup and one for Distractors.

Questions: 1) what actor class do I use? I am using Test_AnnotatedActor_BP with MeshOverride for a lack of better info. 2) how do I use RandomMaterial component if I use a GroupActorManager? I was randomizing the leg colors and this one fixes the legs of the chairs to green as shown below. 000000

My settings:

Ademord commented 4 years ago

Kind of to provide an update to everyone:

Related to the question above, I ended up finding a way to modify the mesh of the objects generated but decided to not go into that direction, based on my doubts below.

My dataset generation takeaways so far:

🔵 Setup1: static chairs + OrbitalMovement camera = DOPE worked

🔵 Setup2: RandomRotation, RandomMovement chairs+ OrbitalMovement camera = DOPE worked BUT the results come out as None None None None quaternions...

🔵 Setup3 (current): Mixed the RandomRotation+ static datasets generated and I am training again.

I refuse strongly to believe that we cannot train with a RandomRotation.

❓ Question 1 My biggest doubt about this is what we mentioned about the ``fixed_transform```: @mintar Correct me if I am wrong: this fixed transform is the initial, base position of the object (chair) AND if I have a RandomRotation dope should still be able to detect a Pose (in contrast to what I have experienced with the NoneNoneNone? Otherwise it means training with RandomRotation is not possible == the object must remain static in the 3D scene?

❓ Question 2 Also, I have been starting to test my networks at 10-20 epochs, is this too early? The loss looks like:

Train Epoch: 15 [0/696182 (0%)]  Loss: 0.008487645536661
Train Epoch: 15 [20000/696182 (3%)]      Loss: 0.007497824262828
Train Epoch: 15 [40000/696182 (6%)]      Loss: 0.007296111900359
Train Epoch: 15 [60000/696182 (9%)]      Loss: 0.007062429096550
Train Epoch: 15 [80000/696182 (11%)]     Loss: 0.007186851929873
Train Epoch: 15 [100000/696182 (14%)]    Loss: 0.007435663603246
Train Epoch: 15 [120000/696182 (17%)]    Loss: 0.006747611332685

🔵 What losses have you achieved yourself on your previous trainings? 🔵 When the loss "stabilizes" and doesnt go down from epochs 15 to 20 (stays from 0.006 and .007) should I stop or is it still learning? Because I stop and test and I get no results, and here is where I panic. In other words: I have no way to compare if I should stop and test, or if I should stop and go back to look / adjust my dataset because my dope did not detect anything after 15 epochs (debugging guidance required).

NVlabs / Deep_Object_Pose

Training of a Custom Object Cont. #137

Am I taking too many angles from the object?

Comment any of these lines to prevent detection / pose estimation of that object

"cracker":"package://dope/weights/cracker_60.pth",

Cuboid dimension in cm x,y,z

optional: provide a transform that is applied to the pose returned by DOPE

"cracker": [[ 0, 0, 1, 0],

[ 0, -1, 0, 0],

[ 1, 0, 0, 0],

[ 0, 0, 0, 1]]

optional: if you provide a mesh of the object here, a mesh marker will be

published for visualization in RViz

You can use the nvdu_ycb tool to download the meshes: https://github.com/NVIDIA/Dataset_Utilities#nvdu_ycb

"cracker": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/003_cracker_box/google_16k/textured.obj",

"gelatin": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/009_gelatin_box/google_16k/textured.obj",

"meat": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/010_potted_meat_can/google_16k/textured.obj",

"mustard": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/006_mustard_bottle/google_16k/textured.obj",

"soup": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/005_tomato_soup_can/google_16k/textured.obj",

"sugar": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/004_sugar_box/google_16k/textured.obj",

"bleach": "file://path/to/Dataset_Utilities/nvdu/data/ycb/aligned_cm/021_bleach_cleanser/google_16k/textured.obj",

optional: If the specified meshes are not in meters, provide a scale here (e.g. if the mesh is in centimeters, scale should be 0.01). default scale: 1.0.

Config params for DOPE