NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.02k stars 287 forks source link

Single model for multiple classes #103

Closed riteshkumar300 closed 4 years ago

riteshkumar300 commented 4 years ago

Hi, Currently you are using different model for different classes. Why didn’t you use a single model? If i train my model on entire FAT dataset and run the following code "python train.py --data path/to/FAT --object soup --outf soup" and then use the weights from train_soup folder, the model is not able to detect anything. Please suggest how can we train one model for all the classes??

yyc9268 commented 4 years ago

The main concept of DOPE is over-fitting an appearance of one object to a single network. That is why DOPE performs well on each object without considering symmetry of shape. In technical view, if you really want to integrate all objects in a single network, you should output (cuboid_vertext_num+1)*(num_object_class) number of of channels for both belief and affinity maps. It looks infeasible, but, you can try and check how it works.

TontonTremblay commented 4 years ago

@yyc9268 is correct in his description. It is actually quite common for pose estimators to only train one model per object, even if the paper might describe something general, the implementation contains most likely one model per object (https://github.com/zju3dv/pvnet/).

In my personal testing, following the description that @yyc9268 proposed, I was able to train 2 objects together. But it does not scale very well, and I would like to argue that pose estimation is much harder to learn than 2d bounding box detection or classification. The main insight is that a slight variation in texture arrangement has a meaningful impact for predicting the pose. This causes the network to have to be super sensitive to how the object looks, whereas in classification or bbox, it does not really matter how these textures look, as long as they are there.

riteshkumar300 commented 4 years ago

@yyc9268 and @TontonTremblay thanks for your explanation.

Abdul-Mukit commented 4 years ago

@TontonTremblay Thank you for the explanation. I am actually working on modifying the network for multiclass classification. It's sort of a personal-learning / course project. I had 2 followup questions. When you said for multi-class it doesn't scale well what did you mean? I didn't really understand. Second about how the model is super sensitive to slight changes in texture. What do you think might be a solution to this? I am interested in studying this more for my research.

In my application, I used material color randomization. As a result even if my object's "texture" varies it can still detect it. I had to detect the pose of gloved hands of surgeons while they hold a cautery tool. So far it's sort of working with a lot of outliers. I don't have very nice 3D scans of hands so the dataset it not proper enough I guess. I am working on that.

TontonTremblay commented 4 years ago

When you said for multi-class it doesn't scale well what did you mean? I didn't really understand.

Training a single neural network to do pose estimation for multiple objects, e.g., spam, soup, cracker, ...

is super sensitive to slight changes in texture. What do you think might be a solution to this?

you could do simple variations on the texture itself. I am not sure to which extent you can push that. This is an interesting research area.

What you described is similar to the detection work we done on cars. You can do detection fairly well with simple color variations. You can probably look into randomizing 3d animated hand mesh. You can look into that with UE4, but that will be a fairly long process, check the marketplace.

Abdul-Mukit commented 4 years ago

Thank you.