chensong1995 / HybridPose

HybridPose: 6D Object Pose Estimation under Hybrid Representation (CVPR 2020)
MIT License
412 stars 64 forks source link

Separating Parameter Search From Pose Regression #52

Open cm107 opened 3 years ago

cm107 commented 3 years ago

After training HybridPose on a custom dataset and confirming the results generated in the output folder, now I am trying to refactor the inference code so that it can be used outside of the training script. My final goal is to run inference on a live video stream from a camera. Obtaining sym_cor_pred, mask_pred, pts2d_map_pred, and graph_pred from a single image is easy since you would just need to implement a predict method in Resnet18_8s. However, doing regression on this inference result to obtain a single t and R result seems to be a bit more tricky since the regressor module is written in C.

I understand that validation data is necessary to search for hyper-parameters pr_para and pi_para, and that these parameters can then be used to run regression on a given test dataset like this. What I want to do is obtain the hyper-parameters pr_para and pi_para, save them to a dump file, and then load them later when running inference on a live camera stream. However, I think that this creates a problem with how the containers used in the regressor are currently being handled.

The containers are initialized here and deleted here. While the containers can be created independent of each other, the regressor.delete_container method requires that all of them be deleted at once. This is a problem for me, since I don't want to recalculate pr_para and pi_para every time I execute my inference script. (I don't see any reason to either, which is why I don't understand the reason for this interface.) You can correct me if I'm wrong, but as far as I know, as long as the object mesh doesn't change, you should only need to find these parameters once. Furthermore, within the python script these containers themselves are just integer pointers, so I am not quite sure how I can save/load them for later use. I'm looking at wrapper.cpp, but there doesn't seem to be any methods in here that would help me.

Given that I am already able to run inference using a validation dataset and test dataset as input, how should I go about implementing a class that can run inference on a video stream one frame at a time?

chensong1995 commented 3 years ago

Hi cm107,

Thanks for your question! The code provided in this repository is not meant to be used in real-time inference. It is a prototype and not written in an efficient way. The key issue is that we are not using multiprocessing in the pose regression module, which is a very severe performance bottleneck. In order to support real-time inference on a video stream, you will need to rewrite a custom wrapper for the pose regressor (preferably in C++), as well as the inefficient prediction filtering method such as this.

If you want to load the hyperparameters of pose regressor from a file, you can create a helper function here and a Python interface here. These two files are pretty straightforward to follow, even if you have not worked with Python-C interface before. After that, you can load the file in Python, and use your helper function to set the hyperparameters accordingly.

I hope this helps.