ethnhe / PVN3D

Code for "PVN3D: A Deep Point-wise 3D Keypoints Hough Voting Network for 6DoF Pose Estimation", CVPR 2020
MIT License
488 stars 105 forks source link

Adaptation to custom dataset #64

Closed luigifaticoso closed 3 years ago

luigifaticoso commented 3 years ago

Hello, I'm trying to train the model on my custom dataset. I have created my synthetic dataset as similar as possible to the Linemod_preprocessed.

My steps:


- I have added the intrinsic Matrix of my camera to the `common.py.`

Regarding my dataset, I have generated the images and labels using BlenderProc; I have the `gt.yml`, `info.yml` and everything except for the Masks. - **Second question:** Do I need to generate the masks or is it generated in some process inside? could that be the problem?

I have modified those necessary files here documented, but I still need to change some parts.
This is the current output:
```shell
python3 -m train.train_blm_pvn3d --cls blm
/home/pytorch/PVN3D/pvn3d/common.py:173: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  self.blm_r_lst = yaml.load(blm_r_file)
/home/pytorch/PVN3D/pvn3d/common.py:134: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  self.lm_r_lst = yaml.load(lm_r_file)
/home/pytorch/PVN3D/pvn3d/common.py:173: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  self.blm_r_lst = yaml.load(blm_r_file)
cls_type:  blm
cls_id in blm_dataset.py 1
/home/pytorch/PVN3D/pvn3d/datasets/blm/blm_dataset.py:41: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  self.meta_lst = yaml.load(meta_file)
Train without rendered data from https://github.com/ethnhe/raster_triangle
Train without fuse data from https://github.com/ethnhe/raster_triangle
train_dataset_size:  10
cls_id in blm_dataset.py 1
val_dataset_size:  15
loading pretrained mdl.
/usr/local/lib/python3.7/dist-packages/torch/nn/_reduction.py:43: UserWarning: size_average and reduce args will be deprecated, please use reduction='mean' instead.
  warnings.warn(warning.format(ret))
{'bn_decay': 0.5,
 'bn_momentum': 0.9,
 'cal_metrics': False,
 'checkpoint': None,
 'cls': 'blm',
 'decay_step': 200000.0,
 'epochs': 1000,
 'eval_net': False,
 'lr': 0.01,
 'lr_decay': 0.5,
 'run_name': 'sem_seg_run_1',
 'test': False,
 'test_occ': False,
 'weight_decay': 0}
epochs:   0%|                                                                                                                                      | 0/25 [00:00<?, ?it/s]
train:   0%|                                                                                                                                     | 0/5000 [00:00<?, ?it/s]

How can I have more output and understand where it's wrong?

ethnhe commented 3 years ago

Hi,

luigifaticoso commented 3 years ago

Ok Perfect, thank you a lot for your message. I have managed to generate the 8 keypoints, I will maybe try different settings later. I have managed to generate the binary mask in Blender Proc, but I'm planning to ampliate the dataset using your solution raster_triangle.

I have managed to go forward, there was an issue in mydataset_dataset.py at the meta part by printing it out I have just commented and it goes forward.

                # else:
                #     meta = meta[0]
                #     print("meta: ",meta)

The next issue I'm trying to understand is this part in train_dataset_pvn3d.py around line 182 where it expects 11 elements from cu_dt but it gets 13 instead. Below is my cu_dt file, I'm trying to understand right now what is the issue here. If you happen to understand it on the spot, it would be great, I'm trying to understand from code step by step going backward.

ValueError: too many values to unpack (expected 11)

[tensor([[[[123., 119., 114.,  ...,  26.,  20.,  14.],
          [112., 120., 117.,  ...,  38.,  43.,  34.],
          [111., 115., 118.,  ...,  42.,  43.,  40.],
          ...,
       .........
        [[711.1113,   0.0000, 255.5000],
         [  0.0000, 711.1113, 255.5000],
         [  0.0000,   0.0000,   1.0000]]], device='cuda:0'), tensor([1000., 1000.], device='cuda:0')]

EDIT: Setting the DEBUG variable to False gives 11 values instead of 13 and it works.

DEBUG = False
ethnhe commented 3 years ago

The DEBUG setting return more information for visualization of preprocessed data in the dataset.py, LineMOD for example. And it's highly recommended to visualize and check if the data are processed properly by visualizing them with python3 -m datasets.linemod.linemod_dataset. Which gives the picture as follows: linemod_vis_data So you can check the center point (blue point) and keypoints (red points) are labeled correctly.

luigifaticoso commented 3 years ago

Thank you for helping, I'm trying to debug the labels because they are not correct. I'm trying to find If I have homogenous model data to the linemod dataset. I have tried to use python3 gen_obj_info.py on the ape model provided by linemod but doesn't seem to have the same numbers as the original. I have scaled down the ply model to m (to have homogenous results ) and use the gen_obj_info, and this is the output ( on the right ) compared with the one provided by linemod dataset : Screenshot from 2021-02-16 15-18-07 My labels are wrong. I am going to share the output of python3 -m datasets.linemod.linemod_dataset This is by scaling the model to m:

train_0_rgb

instead scaled to mm it's totally out of the image

ethnhe commented 3 years ago

The keypoints are in m and are generated by the FPS algorithm with random initialization. The selected points may be different but that won't affect the performance much. For your case, if the generated keypoints of your model are in m, you should check that your GT poses are in 'cv' coordinate system, rather than default poses from Blender.

luigifaticoso commented 3 years ago

Thank you for your message, I started debugging piece by piece and managed to arrive at this point, where the center point is ok but the others are not. Where can I investigate more in this case? I have printed out the points from white ( farthest ) to black ( nearest ) points and this happens.

4_rgb

2_rgb

I have checked my model.ply and doesn't seem to have any problem. Any help to things I could check out could be really helpful! thank you

ethnhe commented 3 years ago

It seems like the left-hand coordinate system and right-hand-coordinate system issue. We use the right-hand coordinate system and it seems like your model is in the left-hand coordinate system. Try to flip the y-axis of each vertex and keypoint of the object in the object coordinate system.

Ultraopxt commented 3 years ago

are in m and are generated by the FPS algorithm with random initialization. The selected points may be different but that won't affect the performance much. For your case, if the generated keypoints of your model are in m, you should check that your GT poses are in 'cv' coordinate system, rather than default poses from Blender.

May I ask original linemod model.ply and keypoints generated form FPS are in mm or in m? Cause my keypoints generated from FPS are inaccurate.

ethnhe commented 3 years ago

The vertexes in the model.ply are all converted to be in m, and the keypoints selected from them with FPS are also in m.

Ultraopxt commented 3 years ago

The vertexes in the model.ply are all converted to be in m, and the keypoints selected from them with FPS are also in m.

Thanks very much! I visualize my datasets,and keypoints and center point are as followings: test_rgb (6) test_rgb (5) test_rgb (4)

May I ask how can I slove the position deviation of keypoints and center point?

ethnhe commented 3 years ago

Not very sure, but it seems that maybe the ground truth pose transforming the object from the object coordinate system to the camera coordinate system is not calculated properly. Also, check that the object coordinate system and the camera coordinate system are all right-hand coordinate systems. You can also transform the mesh vertexes to the camera coordinate system and project them to the image to check the pose parameters.

Ultraopxt commented 3 years ago

Not very sure, but it seems that maybe the ground truth pose transforming the object from the object coordinate system to the camera coordinate system is not calculated properly. Also, check that the object coordinate system and the camera coordinate system are all right-hand coordinate systems. You can also transform the mesh vertexes to the camera coordinate system and project them to the image to check the pose parameters.

Thanks very much!

andreazuna89 commented 1 year ago

@luigifaticoso can you write here how can you generate gt.yml and info.yml for a custom dataset?

Thanks