Closed belgiumkansas closed 5 years ago
Is this just applying YOLO to a recorded video stream?
No this is 2 stages. Stage one is just using the ground truth pose data from gazebo, limiting its range. We feed that strait into state machine stuff bypassing all the CV. Stage two is generating bounding boxes by writing a modified gazebo camera plugin that will tell us where in the image a rendered object is. That just bypasses the darknet. Rest of CV code works the same.
Using CV function projectPoints to take 3D camera space point and project it into 2D image space.
We can use the core pose of an obstacle and some relative offsets to find the outer edges. Then after projecting these points into image space we can look at x/y min/max to generate a bounding box. We add some Gaussian noise to the box and the probability of the detection and we will have a resonable fact simile of YOLO. This does have the problem of not accounting for partial occlusion of the object and no false detection's. But is also much simpler than rewriting the camera plugin to render objects in specific colors and generate bounding boxes from that.
Make a package that can give a reasonable estimate of what YOLO would predict using a gazebo environment running in the background.