NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
993 stars 277 forks source link

Model Symmetry for a Tennis Ball #249

Open rahulram70 opened 2 years ago

rahulram70 commented 2 years ago

Hello.

I'm trying to properly define the symmetry for a tennis ball. Currently, I've stated nothing for the discrete symmetry. However, I defined continuous symmetry like so [{"axis": [0, 0, 1], "offset": [0, 0, 0]}, {"axis": [1, 0, 0], "offset": [0, 0, 0]}, {"axis": [0, 1, 0], "offset": [0, 0, 0]}].

I didn't know how to define align axes, so I left it the same as the one from the cylinder object "align_axes": [{"object": [0, 1, 0], "camera": [0, 0, 1]}, {"object": [0, 0, 1], "camera": [0, 1, 0]}].

Another thing that is confusing is a me is a tennis ball has a logo on only one side. How does that affect the model_info.json?

Any help would be appreciated, thank you in advance.

mintar commented 2 years ago

Damn, when I developed the symmetry handling, I knew somebody would try to model a sphere at some point, but I hoped it wouldn't be so soon. Spheres are not really well supported at the moment. :-(

The computation of symmetries happens in this function:

https://github.com/NVlabs/Deep_Object_Pose/blob/c6d0033ba98df514c022110c5fcae2340d96cff2/scripts/nvisii_data_gen/utils.py#L1490-L1535

What it does is that it multiplies each discrete symmetry with each continuous symmetry. It does not multiply a continuous symmetry with another continuous symmetry. That is how the original code from the BOP toolkit did it, and I didn't fix it. It works for all objects except spheres. This is because there's no geometrical body with 2 continuous symmetries, and the sphere is the only body with 3 continuous symmetries.

You can however fake it: You simply pre-compute two of the continuous symmetries and put them into discrete_symmetries. I've written a script for you that does this:

#!/usr/bin/env python3

from math import sin, cos, pi
import numpy as np
import simplejson as json

DISCRETE_STEPS_COUNT = 64

def rpy_to_transformation_matrix(roll=0.0, pitch=0.0, yaw=0.0):
    R = np.zeros((4, 4))
    R[0, 0] = cos(yaw) * cos(pitch)
    R[0, 1] = -sin(yaw) * cos(roll) + cos(yaw) * sin(pitch) * sin(roll)
    R[0, 2] = sin(yaw) * sin(roll) + cos(yaw) * sin(pitch) * cos(roll)
    R[0, 3] = 0.0
    R[1, 0] = sin(yaw) * cos(pitch)
    R[1, 1] = cos(yaw) * cos(roll) + sin(yaw) * sin(pitch) * sin(roll)
    R[1, 2] = -cos(yaw) * sin(roll) + sin(yaw) * sin(pitch) * cos(roll)
    R[1, 3] = 0.0
    R[2, 0] = -sin(pitch)
    R[2, 1] = cos(pitch) * sin(roll)
    R[2, 2] = cos(pitch) * cos(roll)
    R[2, 3] = 0.0
    R[3, 0] = 0.0
    R[3, 1] = 0.0
    R[3, 2] = 0.0
    R[3, 3] = 1.0
    return R

symmetries_discrete = []

for roll in np.arange(DISCRETE_STEPS_COUNT) / DISCRETE_STEPS_COUNT * 2.0 * np.pi:
    for pitch in np.arange(DISCRETE_STEPS_COUNT) / DISCRETE_STEPS_COUNT * 2.0 * np.pi:
        R = rpy_to_transformation_matrix(roll=roll, pitch=pitch)
        symmetries_discrete.append(list(R.flatten()))

model_info_dict = {
    "symmetries_discrete": symmetries_discrete,
    "symmetries_continuous": [{"axis": [0, 0, 1], "offset": [0, 0, 0]}],
    "align_axes": [{"object": [0, 1, 0], "camera": [0, 0, 1]}, {"object": [0, 0, 1], "camera": [0, 1, 0]}]
}

with open("model_info.json", "w") as fp:
    json.dump(model_info_dict, fp, indent=4)

Another thing that is confusing me is a tennis ball has a logo on only one side. How does that affect the model_info.json?

I think you only have two options:

  1. Don't specify any symmetries at all.
  2. Specify the full 3 continuous symmetries.

If you strictly follow the definition of symmetries, then the logo and the rest of the texture makes your object non-symmetrical. I believe however that DOPE will do much better if you go with option 2.

rahulram70 commented 2 years ago

Thank you so much, Martin. I will try out your script, and let you know the results.

rahulram70 commented 2 years ago

Hello. I ran the script above and was able to successfully generate a few new images. When I trained the dope neural network on this data, the performance was significantly better, and I was able to successfully use the inference.py script.

However, the image generation is incredibly slow. I'm running 5 gpus around the clock and still can only generate 480 images per day. I'm assuming I will need 40-50k images, which makes the current approach infeasible for my timeline.

Are there any alternatives or solutions to this issue? Thank you for your help.

mintar commented 2 years ago

Ah damn, I was afraid that would happen.

The problem is that what my proposal is doing is very inefficient:

Since in this special case, what we really want is the identity rotation: The "best symmetry_corrected transform" is the one where the rotation of the object is (0, 0, 0) in the camera frame.

Here is a dirty hack that directly sets the identity transform. WARNING: Now it only works for spheres, but it should solve your problem!

diff --git a/scripts/nvisii_data_gen/utils.py b/scripts/nvisii_data_gen/utils.py
index 0e624a1..ff7bf6b 100644
--- a/scripts/nvisii_data_gen/utils.py
+++ b/scripts/nvisii_data_gen/utils.py
@@ -1348,72 +1348,10 @@ def update_symmetry_corrected_transform(obj_id, model_info, symmetry_transforms,
     identical for two poses which are symmetrical.
     """

-    # convert align_axes vectors to visii.vec3
-    align_axes = []
-    try:
-        # e.g., "align_axes": [{"object": [0, 1, 0], "camera": [0, 0, 1]}, {"object": [0, 0, 1], "camera": [0, 1, 0]}]
-        for axis_pair in model_info["align_axes"]:
-            object_axis = axis_pair["object"]
-            object_axis = visii.vec3(object_axis[0], object_axis[1], object_axis[2])
-            camera_axis = axis_pair["camera"]
-            camera_axis = visii.vec3(camera_axis[0], camera_axis[1], camera_axis[2])
-            align_axes.append({"object": object_axis, "camera": camera_axis})
-    except (KeyError, IndexError):
-        align_axes.append({"object": visii.vec3(1, 0, 0), "camera": visii.vec3(0, 0, 1)})
-
-    # find the symmetry transform with the best alignment between the first object_axis and the first camera_axis,
-    # using the second object_axis/camera_axis as a tie breaker
-
-    cam_matrix = visii.entity.get(camera_name).get_transform().get_world_to_local_matrix()
-
-    # calculate rotation error for each symmetry transform
-    def calc_alignment_score(object_axis, camera_axis, symmetry_trans):
-        """
-        Transforms the `object_axis` vector into the camera frame using the transformation `trans` and calculates
-        the "alignment score" between the rotated vector and the `camera_axis` vector. The "alignment score" is
-        just the dot product between the vectors, so `math.acos(alignment_score)` is the angle between the vectors.
-        `alignment_score` is between -1 (180 degree angle, worst alignment) and 1 (0 degree angle, best alignment).
-        """
-        mat_trans = visii.transform.get(f"{obj_id}").get_local_to_world_matrix() * symmetry_trans
-        sym_object_to_camera_rotation = visii.quat(cam_matrix * mat_trans)
-        sym_object_axis = sym_object_to_camera_rotation * object_axis
-        return visii.dot(sym_object_axis, camera_axis)
-
-    # score according to the first element of align_axes
-    scored_symmetry_transforms = []
-    for i, trans in enumerate(symmetry_transforms):
-        alignment_score = calc_alignment_score(align_axes[0]["object"], align_axes[0]["camera"], trans)
-        scored_symmetry_transforms.append((alignment_score, trans))
-
-    # Sort from best to worst. Using only first element (score) for sorting to
-    # avoid undefined "<" operator between second element (transforms).
-    scored_symmetry_transforms.sort(key=lambda x: x[0], reverse=True)
-
-    best_symmetry_transform = scored_symmetry_transforms[0][1]
-
-    if len(align_axes) >= 2:
-        # collect best transforms with identical score
-        if len(scored_symmetry_transforms) == 0:
-            return
-        reference_score = scored_symmetry_transforms[0][0]
-        top_symmetry_transforms = []
-        EPSILON = 1e-3
-        for (score, symmetry_transform) in scored_symmetry_transforms:
-            if math.fabs(score - reference_score) > EPSILON:
-                break
-            top_symmetry_transforms.append(symmetry_transform)
-
-        # use second element of align_axes as a tie breaker
-        max_alignment_score = float("-inf")
-        for trans in top_symmetry_transforms:
-            alignment_score = calc_alignment_score(align_axes[1]["object"], align_axes[1]["camera"], trans)
-            if alignment_score > max_alignment_score:
-                max_alignment_score = alignment_score
-                best_symmetry_transform = trans
-
     # update the symmetry_corrected transform
+    object_rotation_quat = visii.transform.get(f"{obj_id}").get_rotation()
     symmetry_corrected_trans = visii.transform.get(f"{obj_id}_symmetry_corrected")
-    symmetry_corrected_trans.set_transform(best_symmetry_transform)
+    symmetry_corrected_trans.set_rotation(visii.inverse(object_rotation_quat))

 # Modified from https://github.com/thodan/bop_toolkit/
-- 
2.25.1
rahulram70 commented 2 years ago

Thank you for the script. Should I leave the generated model_info.json inside the tennis ball's directory?

mintar commented 2 years ago

With the hack above, the model_info.json doesn't matter anymore. You can leave it or delete it, it doesn't make a difference.

rahulram70 commented 2 years ago

Sounds good. I'm trying to train these custom images on multiple gpus using the command below.

python -m torch.distributed.launch --nproc_per_node=1 train.py --network dope --epochs 10000 --batchsize 150 --outf tmp_tennis_sym_output_1/ --data ../nvisii_data_gen/output_complete/ --dontsave --gpuids 0 1 2 3

However, I'm running into CUDA memory issues since it seems DOPE is using all the memory in the first gpu and not using the memory in the other gpus. I've attached the nvidia-smi below. How would I go about fixing this?

image

mintar commented 2 years ago

I don't know how to solve the problem that it only uses 1 GPU. In general, you have to reduce the batchsize if you run into memory problems. Also, --epochs 60 should be enough (given that you have enough training images - something between 20k and 60k). I would also recommend not to use the --dontsave option - if you let it save the intermediate weights, you will be able to abort when you're satisfied, or resume training later.

rahulram70 commented 2 years ago

Thank you for your response. I'm trying to train the model with 2500 images. I found it takes quite a while to train. I trained for over 500 epochs, and I'm getting the following result. I will eventually train on 50k images, but is this normal behavior?

image

mintar commented 2 years ago

2.5k images is definitely not enough. I usually use 60k images and 60 epochs. This takes about 48h on a single RTX3090 or RTXA6000. I vaguely remember I had some problems getting it to run on our A100s and with multiple GPUs, so that's why I'm using single RTX GPUs for now.

To debug any problems, it's really valuable to visualize the training progress, including the belief maps, like this:

pip install tensorboard
tensorboard --logdir <your output dir>/runs/
TontonTremblay commented 2 years ago

The old train code should train fine on multi GPU, I used 4 p100 for the original paper. Recently I have used 4 a100 as well with the new train code which uses a different torch data distribution.

WilkoLu commented 1 year ago

The old train code should train fine on multi GPU, I used 4 p100 for the original paper. Recently I have used 4 a100 as well with the new train code which uses a different torch data distribution.

@TontonTremblay Can you provide the command you used with the new training script while training on your cluster? Ive got a problem where the cluster is only using 1 gpu like shown above.