NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
996 stars 278 forks source link

Training on Custom Object using NDDS does not detect object after training #141

Open sebastian-ruiz opened 3 years ago

sebastian-ruiz commented 3 years ago

Steps that I took:

In the loss_train.csv file I see that the loss in epoch 1 is 0.28 and the loss in epoch 60 is 2.9E-09

The training data generated with NDDS looks like this:

000005 000000 000043

I adapted the script from this issue to apply inference to the training images. This is my script for inference. When using the script on the soup cans and the trained model from the readme, it works perfectly. When applied to my custom object, DOPE does not detect the object class, nor does it detect the pose.

TontonTremblay commented 3 years ago

Hello, so sorry your model does not work well. There was a similar discussion about a chair detection, https://github.com/NVlabs/Deep_Object_Pose/issues/137. If you could visualize the the belief maps, that might help us understand what the network sees. The loss looks very low, what is the loss at epoch 1-2-3? Could you share a json file from training data.

sebastian-ruiz commented 3 years ago

I try and get the belief maps using the following code snippet:

for j in range(vertex2.size()[0]):
    belief = vertex2[j].clone()
    belief -= float(torch.min(belief).data.cpu().numpy())
    belief /= float(torch.max(belief).data.cpu().numpy())
    belief = torch.clamp(belief, 0, 1)
    belief = torch.cat([belief.unsqueeze(0), belief.unsqueeze(0), belief.unsqueeze(0)]).unsqueeze(0)
    temp = Variable(belief.clone())
    array_belief = temp.data.squeeze().cpu().numpy().transpose(1, 2, 0) * 255
    cv2.imshow('belief_' + str(j), array_belief)

where vertex2 is the variable from the detect_object_in_image(...) function in detector.py (see my inference.py script for more info).

When I try and get the belief maps of the soup cans I get this:

belief_maps_soup_can

but when I try the same with my object I get this (this image is from the training set. Using the webcam I also get the same belief maps and no box):

belief_maps_kalo

My _object_settings.json file looks like this:

{
    "exported_object_classes": [
        "KALO_K1_5_edited"
    ],
    "exported_objects": [
        {
            "class": "KALO_K1_5_edited",
            "segmentation_class_id": 144,
            "segmentation_instance_id": 16014600,
            "fixed_model_transform": [
                [ 0, 0, 1, 0 ],
                [ -1, 0, 0, 0 ],
                [ 0, -1, 0, 0 ],
                [ -0.55800002813339233, -0.25029999017715454, 0.22859999537467957, 1 ]
            ],
            "cuboid_dimensions": [ 119.63559722900391, 29.607500076293945, 38.399700164794922 ]
        }
    ]
}

I noticed that my cuboid_dimensions are really in meters, where my object is 119m long even though it should be 11.9cm long. I have therefore set the config_pose.yaml file to the following, so that it corresponds to the size of the object in the training data:

dimensions: {
    "KALO_K1_5_edited": [ 119.63559722900391, 29.607500076293945, 38.399700164794922 ],
   ...
}
mesh_scales: {
    "KALO_K1_5_edited":  1.0,
   ...
}

The losses at the end of each epoch 1, 2, 3, 4, 5 from loss_train.csv are:

1, 624,0.000003241889772
2, 624,0.000002674767757
3, 624,0.000000930202077
4, 624,0.000000722245716
5, 624,0.000001037328047
...
15, 624,0.000000092100692
...
30, 624,0.000000032321594
TontonTremblay commented 3 years ago

Your code to display the belief maps is not correct. DOPE has two outputs, you are using the vertex map, the one that is pointing toward the center of the object.

Here is my code I am currently using (sorry I need to update this repo):


    @staticmethod
    def detect_object_in_image(net_model, pnp_solver, in_img, config, 
            grid_belief_debug = False, norm_belief=True,run_sampling=False,network='dope'):
        ''' Detect objects in a image using a specific trained network model
            Returns the poses of the objects and the belief maps
            '''
        if in_img is None:
            return []

        if network == 'full':
            scale_factor = 1
            OFFSET_DUE_TO_UPSAMPLING = 0 
        else: # 'dope' and 'mobile'
            scale_factor = 8
            # OFFSET_DUE_TO_UPSAMPLING = 0.4395
            OFFSET_DUE_TO_UPSAMPLING = 0

        # print("detect_object_in_image - image shape: {}".format(in_img.shape))

        # Run network inference
        # print(in_img.shape)
        image_tensor = transform(in_img)
        image_torch = Variable(image_tensor).cuda().unsqueeze(0)
        # print(image_torch.shape)
        out, seg = net_model(image_torch)  # run inference using the network (calls 'forward' method)
        vertex2 = out[-1][0]
        aff = seg[-1][0]

        # Find objects from network output
        detected_objects = ObjectDetector.find_object_poses(vertex2, aff, pnp_solver, config,
            run_sampling=run_sampling,
            scale_factor = scale_factor,
            OFFSET_DUE_TO_UPSAMPLING = OFFSET_DUE_TO_UPSAMPLING)

        if not grid_belief_debug: 

            return detected_objects, None
        else:
            # Run the belief maps debug display on the beliefmaps

            upsampling = nn.UpsamplingNearest2d(scale_factor=scale_factor)
            tensor = vertex2
            belief_imgs = []
            in_img = (torch.tensor(in_img).float()/255.0)
            in_img *= 0.7            

            for j in range(tensor.size()[0]):
                belief = tensor[j].clone()
                if norm_belief:
                    belief -= float(torch.min(belief).item())
                    belief /= float(torch.max(belief).item())

                # print (image_torch.size())
                # raise()    
                # belief *= 0.5
                # print(in_img.size())
                belief = upsampling(belief.unsqueeze(0).unsqueeze(0)).squeeze().squeeze().data 
                belief = torch.clamp(belief,0,1).cpu()  
                belief = torch.cat([
                            # belief.unsqueeze(0) + in_img[:,:,0],
                            # belief.unsqueeze(0) + in_img[:,:,1],
                            # belief.unsqueeze(0) + in_img[:,:,2]
                            belief.unsqueeze(0),
                            belief.unsqueeze(0),
                            belief.unsqueeze(0)

                            ]).unsqueeze(0)
                belief = torch.clamp(belief,0,1) 

                # belief_imgs.append(belief.data.squeeze().cpu().numpy().transpose(1,2,0))
                belief_imgs.append(belief.data.squeeze().numpy())

            # Create the image grid
            belief_imgs = torch.tensor(np.array(belief_imgs))

            im_belief = ObjectDetector.get_image_grid(belief_imgs, None,
                mean=0, std=1)

            return detected_objects, im_belief

    @staticmethod
    def make_grid(tensor, nrow=8, padding=2,
                  normalize=False, range_=None, scale_each=False, pad_value=0):
        """Make a grid of images.
        Args:
            tensor (Tensor or list): 4D mini-batch Tensor of shape (B x C x H x W)
                or a list of images all of the same size.
            nrow (int, optional): Number of images displayed in each row of the grid.
                The Final grid size is (B / nrow, nrow). Default is 8.
            padding (int, optional): amount of padding. Default is 2.
            normalize (bool, optional): If True, shift the image to the range (0, 1),
                by subtracting the minimum and dividing by the maximum pixel value.
            range (tuple, optional): tuple (min, max) where min and max are numbers,
                then these numbers are used to normalize the image. By default, min and max
                are computed from the tensor.
            scale_each (bool, optional): If True, scale each image in the batch of
                images separately rather than the (min, max) over all images.
            pad_value (float, optional): Value for the padded pixels.
        Example:
            See this notebook `here <https://gist.github.com/anonymous/bf16430f7750c023141c562f3e9f2a91>`_
        """
        import math

        if not (torch.is_tensor(tensor) or
                (isinstance(tensor, list) and all(torch.is_tensor(t) for t in tensor))):
            raise TypeError('tensor or list of tensors expected, got {}'.format(type(tensor)))

        # if list of tensors, convert to a 4D mini-batch Tensor
        if isinstance(tensor, list):
            tensor = torch.stack(tensor, dim=0)

        if tensor.dim() == 2:  # single image H x W
            tensor = tensor.view(1, tensor.size(0), tensor.size(1))
        if tensor.dim() == 3:  # single image
            if tensor.size(0) == 1:  # if single-channel, convert to 3-channel
                tensor = torch.cat((tensor, tensor, tensor), 0)
            tensor = tensor.view(1, tensor.size(0), tensor.size(1), tensor.size(2))

        if tensor.dim() == 4 and tensor.size(1) == 1:  # single-channel images
            tensor = torch.cat((tensor, tensor, tensor), 1)

        if normalize is True:
            tensor = tensor.clone()  # avoid modifying tensor in-place
            if range_ is not None:
                assert isinstance(range_, tuple), \
                    "range has to be a tuple (min, max) if specified. min and max are numbers"

            def norm_ip(img, min, max):
                img.clamp_(min=min, max=max)
                img.add_(-min).div_(max - min + 1e-5)

            def norm_range(t, range_):
                if range_ is not None:
                    norm_ip(t, range_[0], range_[1])
                else:
                    norm_ip(t, float(t.min()), float(t.max()))

            if scale_each is True:
                for t in tensor:  # loop over mini-batch dimension
                    norm_range(t, range)
            else:
                norm_range(tensor, range)

        if tensor.size(0) == 1:
            return tensor.squeeze()

        # make the mini-batch of images into a grid
        nmaps = tensor.size(0)
        xmaps = min(nrow, nmaps)
        ymaps = int(math.ceil(float(nmaps) / xmaps))
        height, width = int(tensor.size(2) + padding), int(tensor.size(3) + padding)
        grid = tensor.new(3, height * ymaps + padding, width * xmaps + padding).fill_(pad_value)
        k = 0
        for y in range(ymaps):
            for x in range(xmaps):
                if k >= nmaps:
                    break
                grid.narrow(1, y * height + padding, height - padding)\
                    .narrow(2, x * width + padding, width - padding)\
                    .copy_(tensor[k])
                k = k + 1
        return grid

    @staticmethod
    def get_image_grid(tensor, filename, nrow=3, padding=2,mean=None, std=None):
        """
        Saves a given Tensor into an image file.
        If given a mini-batch tensor, will save the tensor as a grid of images.
        """
        from PIL import Image

        # tensor = tensor.cpu()
        grid = ObjectDetector.make_grid(tensor, nrow=nrow, padding=10,pad_value=1)
        if not mean is None:
            # ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0,2).transpose(0,1).numpy()
            ndarr = grid.mul(std).add(mean).mul(255).byte().transpose(0,2).transpose(0,1).numpy()
        else:      
            ndarr = grid.mul(0.5).add(0.5).mul(255).byte().transpose(0,2).transpose(0,1).numpy()
        im = Image.fromarray(ndarr)
        # im.save(filename)
        return im

try to run this please and report what you get.

sebastian-ruiz commented 3 years ago

Thank you for the code. Here are the results:

Verifying that it works correctly with the soup can:

belief_maps_soup_can_correct

Trying on image from test set:

belief_maps_kalo_correct

Trying on webcam:

Screenshot_20201013_134524

From this I would say that something has gone wrong with the training.

My header.txt looks like this:

Namespace(batchsize=32, data='/home/sruiz/datasets/kalo1.5_1_object_100000_400x400', datasize=None, datatest='/home/sruiz/datasets/kalo1.5_1_object_2000_400x400', epochs=60, gpuids=[0, 1, 2, 3], imagesize=400, loginterval=100, lr=0.0001, manualseed=57, namefile='epoch', nbupdates=None, net='', noise=2.0, object='KALO_K1_5_edited', option='default', outf='train_tmp', pretrained=True, save=False, sigma=4, workers=8)seed: 57
TontonTremblay commented 3 years ago

Could you visualize the gt belief maps you generate when training. One of these, https://github.com/NVlabs/Deep_Object_Pose/blob/master/scripts/train.py#L1338 . I think there is something going on with your training data.

TontonTremblay commented 3 years ago
    "exported_objects": [
        {
            "class": "KALO_K1_5_edited",
            "segmentation_class_id": 144,
            "segmentation_instance_id": 16014600,
            "fixed_model_transform": [
                [ 0, 0, 1, 0 ],
                [ -1, 0, 0, 0 ],
                [ 0, -1, 0, 0 ],
                [ -0.55800002813339233, -0.25029999017715454, 0.22859999537467957, 1 ]
            ],
            "cuboid_dimensions": [ 119.63559722900391, 29.607500076293945, 38.399700164794922 ]
        }

Where are the projected cuboid points?

TontonTremblay commented 3 years ago

I am pretty sure your export is missing projected_cuboid, I dont remember if you have to add it to NDDS to export it. I have not use it in a while, @thangt can you refresh my memory?

thangt commented 3 years ago

@TontonTremblay That is the object_settings.json file which describe the object to capture. The projected_cuboid is in each of the captured frame data: 000001.json ...

sebastian-ruiz commented 3 years ago

Could it be that there is something wrong with my training data? Here is the data for 000001. In unreal engine the segmentation id assignment type and segmentation id assignment type are both set to spread evenly.

000001.cs.png: 000001 cs

000001.depth.png: 000001 depth

000001.is.png: 000001 is 000001.png: 000001

And the 000001.json file. These 4 images and 1 json file make up the training data for this item.

sebastian-ruiz commented 3 years ago

I think I am making a mistake in config_pose.yaml. Namely, in class_ids I do the following:

class_ids: {
    "cracker": 1,
    "gelatin": 2,
....... [omitted] ......
    "GreenBeans"        : 35, 
    "PeasAndCarrots"    : 36,
    "KALO_K1_5_edited"  : 37 
}

I think that the id of "KALO_K1_5_edited" shoud be that picked by NDDS in *****.cs.png.

TontonTremblay commented 3 years ago

Class id should not impact the algorithm. Can you try to render the belief map from a GT you generated please? See above which line in the training.

sebastian-ruiz commented 3 years ago

Sorry for the delay. I am retraining the model and storing the target_belief (https://github.com/NVlabs/Deep_Object_Pose/blob/master/scripts/train.py#L1338) for every epoch. I will report back.

Gaoee commented 3 years ago

Hi, I am also trying to use NDDS to generate a new data set. But how can I import my FBX format 3D model and replace the original object in NDDS. Thank you.