NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.01k stars 283 forks source link

Need Help in detecting custom object #241

Open sinasadeghian opened 2 years ago

sinasadeghian commented 2 years ago

Hi, I'm trying to detect pose of packages using dope. I used nvisii_data_gen for generating dataset. I generated 10k images for training and 2k for testing with --spp 2000 / nb_distractors 15

This is link of my object file. I used train.py at train2 folder for training with 60 epochs, lr = 0.001, network = dope, batchsize =32. I'm using realsense D435i camera for testing in realtime. However, dope node is not detecting the box.

This file includes output of training/testing, sample of json files for training, camera_info and cofig_pose yaml files: output_files.zip

These are samples of datasets. In some pictures amazon box is obvious and some not. result

This is what rviz shows in realtime testing, however /dope/detected_objects topic shows nothing. Screenshot from 2022-04-05 19-38-22

Screenshot from 2022-04-05 19-35-09

Which part am I doing wrong?

TontonTremblay commented 2 years ago

Interesting! Looks like you have some symmetries on your model. @mintar just pushed a great script to generate data with symmetries. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen#handling-objects-with-symmetries I am sure you can run the script on the data you already generated.

Although, strange that it does not find the middle keypoint. This is very strange. When it sees the smile on the side how does it look like? Does the tape goes around the box? Also strange that it does not find heatmap on the other corners... I am a little confused here...

When you trained did your filtered by Amazon_box_1? I should add that in the training script when it starts. If you do not specify I think by default it uses all of the objects. Thank you for the zip file, but could you include one or two training images?

sinasadeghian commented 2 years ago

I'm generating extra 1k data without distractors and 1k with 1 to 20 distractors, to see if it will improve the result.

When it sees the smile on the side how does it look like? ( I didn't get what do you mean) Yes, the tape goes all around the box.

I used these args for training python3 -m torch.distributed.launch --nproc_per_node=1 /s/dope_v4/scripts/train2/train.py \ --network dope \ --data /s/dope_v4/scripts/nvisii_data_gen/output/dataset \ --datatest /s/s/dope_v4/scripts/nvisii_data_gen/output/dataset_train/test \ --object box \ --outf /s/amzn_box \ --gpuids 0 1 2 3 4 5 6 7 \ --batchsize 32 \ --epochs 60 \

Here is some of generated images I used for training and testing Data.zip

Also, in ROS /dope/detected_objects topic is not detecting anything. How could I draw lines around the detected objects in ROS?

Thank you so much for your help

mintar commented 2 years ago

A couple of comments:

@TontonTremblay wrote:

Interesting! Looks like you have some symmetries on your model. @mintar just pushed a great script to generate data with symmetries. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen#handling-objects-with-symmetries I am sure you can run the script on the data you already generated.

No, not with the script I've pushed, you would have to regenerate the data.

Regarding symmetries, your object has two sides that look identical:

snapshot00

snapshot01

But due to the texture on the other 4 sides, it's not really a rotational symmetry (if you compare the two images above, they don't look pixel-wise identical due to the top side). Expect bad results when the camera can only see one of the two identical sides (because then ), but as soon as other parts of the object are visible, it should be fine. If that's a problem for you, you can specify a 180° rotational symmetry around the y axis and regenerate the training data, but personally I don't think it's necessary or helpful.

mintar commented 2 years ago

Oh, and one more thing regarding the amount of training data: Did you specify --nb_objects 1? Your images look like it. If you regenerate the dataset, consider cranking that number up, because that's a cheap way of generating more training data. Let's say the object is only visible in half of the images, and because you only trained on 2k images, that would mean that you only have 1k views of the object. In my datasets, I'm always using around --nb_objects 10-20 times the number of models (if I have multiple models), so again assuming an object is visible in half the frames, times 60k images in my datasets, that would be 300k-600k views of the object (300-600x what you used).

TL;DR: Need moar data! :-)

TontonTremblay commented 2 years ago

I had success with 20k, but 60k is what I aim for.

@mintar good catch on the split. Maybe I could write the split in the dataloader directly. Normally I have been testing on real images from a different dataset. Thank you for the detail comments. I hope this helps you @sinasadeghian

sinasadeghian commented 2 years ago

Thank you so much for your help. I managed to get results through using --nb_objects 10-20 with 50k images for training. I was wondering, how could I calculate the accuracy of the trained Model? Also, what is impact of -spp in generating data?

TontonTremblay commented 2 years ago

Do you have annotated ground truth of your box? You could use ADD, on some other github issues I shared some code I wrote for it.

Sample per pixel (spp) determines the quality of the render, higher the value, the higher the fidelity. But as for sim 2 real and performance for detection, I have no idea. It would make for possibly an interesting experiment to vary the sample per pixel and to use the denoiser vs. not.

sinasadeghian commented 2 years ago

How could I annotate ground truth of the box? what value of spp did you use for your data? What factors would improve the accuracy of the model?

TontonTremblay commented 2 years ago

For annotation, you will need to do 3d annotation on an image. We had different methods, but nothing is simple. If you have a robot you could look with the end effector of the robot where it ends on vs where the prediction is.

I use 200/400 for testing, and 2000/4000 for rendering the data.

I would try to include more diverse data, specially modelling more closely where you want to deploy your model. Also I would probably look into training something different than dope. DOPE is an easy way to get into pose estimation. https://github.com/ylabbe/cosypose this is probably the best one out there. I hope this helps.