Open sinasadeghian opened 2 years ago
Interesting! Looks like you have some symmetries on your model. @mintar just pushed a great script to generate data with symmetries. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen#handling-objects-with-symmetries I am sure you can run the script on the data you already generated.
Although, strange that it does not find the middle keypoint. This is very strange. When it sees the smile on the side how does it look like? Does the tape goes around the box? Also strange that it does not find heatmap on the other corners... I am a little confused here...
When you trained did your filtered by Amazon_box_1
? I should add that in the training script when it starts. If you do not specify I think by default it uses all of the objects. Thank you for the zip file, but could you include one or two training images?
I'm generating extra 1k data without distractors and 1k with 1 to 20 distractors, to see if it will improve the result.
When it sees the smile on the side how does it look like? ( I didn't get what do you mean) Yes, the tape goes all around the box.
I used these args for training python3 -m torch.distributed.launch --nproc_per_node=1 /s/dope_v4/scripts/train2/train.py \ --network dope \ --data /s/dope_v4/scripts/nvisii_data_gen/output/dataset \ --datatest /s/s/dope_v4/scripts/nvisii_data_gen/output/dataset_train/test \ --object box \ --outf /s/amzn_box \ --gpuids 0 1 2 3 4 5 6 7 \ --batchsize 32 \ --epochs 60 \
Here is some of generated images I used for training and testing Data.zip
Also, in ROS /dope/detected_objects topic is not detecting anything. How could I draw lines around the detected objects in ROS?
Thank you so much for your help
A couple of comments:
You have mixed up the training and testing datasets, so you've only been training on 2k images:
training data: 311 batches testing data: 1900 batches load data: ['/s/dope_v4/scripts/nvisii_data_gen/output/dataset/000'] # training load data: ['/s/dope_v4/scripts/nvisii_data_gen/output/dataset_train/test'] # testing
I think this is your main problem. Just switch those around and retrain. Personally, I've always generated 60k images for the training data, but I think that's a bit overkill. @TontonTremblay : What do you think how many training images are good?
Also, when you re-run training, if you still have problems please upload the runs/
folder. You can use tensorboard to get some really useful information like this: tensorboard --logdir runs/
.
Your object specification is correct: You specified --object box
, and your json file has "class": "box"
, so that's fine.
In your config_pose.yaml
, you've used meters for the dimensions (because the mesh is in meters):
dimensions: { "box":[0.3400000035762787,0.14000000059604645,0.2840000092983246], }
But the dimensions
always have to be specified in centimeters, no matter what the scaling of the mesh is:
dimensions: { "box":[34.00000035762787,14.000000059604645,28.40000092983246], }
I know this is confusing and could be documented better. Obviously this does not affect training, and even during inference the /dope/detected_objects
topic would show something, but the returned pose would be wrong (scaled by 0.01, so very close to the camera). This does not affect your current problem, but better fix it.
@TontonTremblay wrote:
Interesting! Looks like you have some symmetries on your model. @mintar just pushed a great script to generate data with symmetries. https://github.com/NVlabs/Deep_Object_Pose/tree/master/scripts/nvisii_data_gen#handling-objects-with-symmetries I am sure you can run the script on the data you already generated.
No, not with the script I've pushed, you would have to regenerate the data.
Regarding symmetries, your object has two sides that look identical:
But due to the texture on the other 4 sides, it's not really a rotational symmetry (if you compare the two images above, they don't look pixel-wise identical due to the top side). Expect bad results when the camera can only see one of the two identical sides (because then ), but as soon as other parts of the object are visible, it should be fine. If that's a problem for you, you can specify a 180° rotational symmetry around the y axis and regenerate the training data, but personally I don't think it's necessary or helpful.
Oh, and one more thing regarding the amount of training data: Did you specify --nb_objects 1
? Your images look like it. If you regenerate the dataset, consider cranking that number up, because that's a cheap way of generating more training data. Let's say the object is only visible in half of the images, and because you only trained on 2k images, that would mean that you only have 1k views of the object. In my datasets, I'm always using around --nb_objects
10-20 times the number of models (if I have multiple models), so again assuming an object is visible in half the frames, times 60k images in my datasets, that would be 300k-600k views of the object (300-600x what you used).
TL;DR: Need moar data! :-)
I had success with 20k, but 60k is what I aim for.
@mintar good catch on the split. Maybe I could write the split in the dataloader directly. Normally I have been testing on real images from a different dataset. Thank you for the detail comments. I hope this helps you @sinasadeghian
Thank you so much for your help. I managed to get results through using --nb_objects 10-20 with 50k images for training. I was wondering, how could I calculate the accuracy of the trained Model? Also, what is impact of -spp in generating data?
Do you have annotated ground truth of your box? You could use ADD, on some other github issues I shared some code I wrote for it.
Sample per pixel (spp) determines the quality of the render, higher the value, the higher the fidelity. But as for sim 2 real and performance for detection, I have no idea. It would make for possibly an interesting experiment to vary the sample per pixel and to use the denoiser vs. not.
How could I annotate ground truth of the box? what value of spp did you use for your data? What factors would improve the accuracy of the model?
For annotation, you will need to do 3d annotation on an image. We had different methods, but nothing is simple. If you have a robot you could look with the end effector of the robot where it ends on vs where the prediction is.
I use 200/400 for testing, and 2000/4000 for rendering the data.
I would try to include more diverse data, specially modelling more closely where you want to deploy your model. Also I would probably look into training something different than dope. DOPE is an easy way to get into pose estimation. https://github.com/ylabbe/cosypose this is probably the best one out there. I hope this helps.
Hi, I'm trying to detect pose of packages using dope. I used nvisii_data_gen for generating dataset. I generated 10k images for training and 2k for testing with --spp 2000 / nb_distractors 15
This is link of my object file. I used train.py at train2 folder for training with 60 epochs, lr = 0.001, network = dope, batchsize =32. I'm using realsense D435i camera for testing in realtime. However, dope node is not detecting the box.
This file includes output of training/testing, sample of json files for training, camera_info and cofig_pose yaml files: output_files.zip
These are samples of datasets. In some pictures amazon box is obvious and some not.
This is what rviz shows in realtime testing, however /dope/detected_objects topic shows nothing.
Which part am I doing wrong?