cviviers / YOLOv5-6D-Pose

6-DoF Pose estimation based on the YOLOv5 framework. Specific focus on instruments in X-ray applications
https://ieeexplore.ieee.org/document/10478293
GNU General Public License v3.0
32 stars 6 forks source link

use camera #13

Closed flashmoment closed 1 week ago

flashmoment commented 2 weeks ago

I saw the definition of the LoadStreams class in the code, but I did not see how you use it. I have a RealSense depth camera, and I would like to know how to use it for real-time detection.

flashmoment commented 2 weeks ago

Hello, I noticed that you defined IoU in your code, but it is not used. I also saw your comment regarding target confidence. Did you try modifying tobj[b, a, gj, gi] to (confidence + iou) / 2 but found that tobj[b, a, gj, gi] = confidence worked better?

cviviers commented 1 week ago

Hi @flashmoment,

The current line of work was focusing on RGB and Grayscale. You could straightforwardly implement a model to work with RGBD by simply increasing the number of input channels to 4. This should also be more accurate than RGB.

As for your second question: yes, although I did not do many experiments, just using the confidence produced more stable results. I think it is because the current model does not predict the bounding box, it is merely calculated from the keypoints. You could have the model predict the bounding box and keypoints, then use it to improve the confidence training. Recent human pose estimation methods do it this way.

flashmoment commented 1 week ago

Hi @flashmoment,

The current line of work was focusing on RGB and Grayscale. You could straightforwardly implement a model to work with RGBD by simply increasing the number of input channels to 4. This should also be more accurate than RGB.

As for your second question: yes, although I did not do many experiments, just using the confidence produced more stable results. I think it is because the current model does not predict the bounding box, it is merely calculated from the keypoints. You could have the model predict the bounding box and keypoints, then use it to improve the confidence training. Recent human pose estimation methods do it this way.

hello!If I only have an RGB camera, can I directly use the trained model for real-time estimation? Can I simply change the source parameter to 0, just like in YOLOv5?

cviviers commented 1 week ago

Yes, should work the same. The only thing to add is the camera parameters per frame captured, but if you dont expect to zoom or anything then you can hardcode it in the new detection script