Open dexception opened 6 years ago
Fast object tracking usually is used if your detector can't achive real-time FPS, so detection will be applied for each N-frames (depends on GPU performance), and only tracking will be move detected bounded boxes for each frame.
for (auto &i : result_vec)
{
cv::Rect r(i.x, i.y, i.w, i.h);
//cv::Mat roi = cur_frame(r);
dlib::rectangle face = openCVRectToDlib(r);
}
@AlexeyAB I want to do it for N-Frames i.e as long as we can track the face. Dlib code for face recognition i am using is by calculating the distance between two vectors of matrix inside 2 for loops. So getting FPS of 2... I have compiled DLIB with cuda. This is with Dlib Face detector+Dlib landmark+Comparison of distance.
I have put the code for face recognition code inside draw_boxes method. With your tracking code i am getting 6-16 FPS. Yolov3 face detector+Dlib landmark+Comparison of distance
The draw_boxes method is taking 40 ms for each frame. (conversion of cv Rect to Dlib Rect+ Dlib landmark+Comparison of distance)
But still away from 25 FPS.
I don't understand your last point.
Do you have a personal id where i can send the entire code ?
The draw_boxes method is taking 40 ms for each frame. (conversion of cv Rect to Dlib Rect+ Dlib landmark+Comparison of distance)
But still away from 25 FPS.
I don't understand your last point.
I.e. Capturing+Tracking+draw_boxes()+Saving_video is launched for each frame, if video-stream from IP-camera has 30 FPS, then it these functions will be launched 30 times per second.
Detection (Yolo) is launched only for each N frame, N depends on GPU performance. I.e. if video-stream from IP-camera has 30 FPS, but your GPU can process only 5 FPS, then Detection (Yolo) will be launched only for each 6th frame.
So, if you get 6-16 FPS, but you want to get 25 FPS, you can buy new GPU or just do face-recognition only for each 6th frame (for example).
Just add here unsigned int face_id;
https://github.com/AlexeyAB/darknet/blob/6d44529cf93211c319813c90e0c1adb34426abe5/src/yolo_v2_class.hpp#L22
And add here: https://github.com/AlexeyAB/darknet/blob/eff487ba3626a39e135d13929117e04bc4cf5823/src/yolo_console_dll.cpp#L360 such code:
for (auto &i : result_vec)
{
... get recognized_face_id by using Dlib
i.face_id = recognized_face_id;
}
You can attach code here, or put url to the google-disk.
code: version2 https://drive.google.com/open?id=1oj8-GdJSSqF32A-nZbOQn-cMr3FHJOPY
version3 https://drive.google.com/file/d/15IDALtHAx2wAuu3RY1AinvSznLibBzVb
Having few problems editing the struct bbox_t
So created a new one:
struct bbox_face_map {
int matchingBestIndex;
float x;
float y;
};
The FPS is varying in terms of speed and not smooth enough to be called stable..
@AlexeyAB I have not been able to modify struct bbox_t .. the code is crashing. I think there is more to it.
I have not been able to modify struct bbox_t .. the code is crashing. I think there is more to it.
After changing struct bbox_t
you should recompile (DLL/SO-library) yolo_cpp_dll.sln
and then recompile your soft.
So just try to move most of your own code from draw_boxes()
to this place - before this line: https://github.com/AlexeyAB/darknet/blob/eff487ba3626a39e135d13929117e04bc4cf5823/src/yolo_console_dll.cpp#L360
Add to the struct bbox_t
field int face_id
and recompile DLL-library.
Then set face_id = matchingBestIndex;
for each corresponding face, or if isn't recognized then set face_id=-1
Then in the draw_boxes()
just print face-id number for each found face, just add after this line: https://github.com/AlexeyAB/darknet/blob/eff487ba3626a39e135d13929117e04bc4cf5823/src/yolo_console_dll.cpp#L191
obj_name += " - " + std::to_string(i.face_id);
@AlexeyAB
After adding the code at
the matchingBestIndex is still -1 inside the draw_boxes method. But if i add the below code at
the matchingBestIndex is getting updated.
Is it possible to call the doFaceRecognition method only when a new track_id is generated at the beginning and not at every object detection and if the doFaceRecognition already has a matchingBestIndex it need not update it ?
for (auto &i : result_vec)
{
int _matchingBestIndex = doFaceRecognition(cur_frame, i);
if (_matchingBestIndex != -1)
{
i.matchingBestIndex = _matchingBestIndex;
}
}
void draw_boxes(cv::Mat mat_img, std::vector<bbox_t> result_vec, std::vector<std::string> obj_names,
int current_det_fps = -1, int current_cap_fps = -1)
{
int const colors[6][3] = { { 1,0,1 },{ 0,0,1 },{ 0,1,1 },{ 0,1,0 },{ 1,1,0 },{ 1,0,0 } };
for (auto &i : result_vec)
{
cv::Scalar color = obj_id_to_color(i.obj_id);
cv::Rect r(i.x, i.y, i.w, i.h);
cv::rectangle(mat_img, r, color, 2);
if (obj_names.size() > i.obj_id)
{
std::string obj_name = obj_names[i.obj_id];
if (i.track_id > 0)
{
if (i.matchingBestIndex > 0)
{
obj_name = file_list[i.matchingBestIndex];
}
else
{
obj_name += " - " + std::to_string(i.track_id);
}
}
cv::Size const text_size = getTextSize(obj_name, cv::FONT_HERSHEY_COMPLEX_SMALL, 1.2, 2, 0);
int const max_width = (text_size.width > i.w + 2) ? text_size.width : (i.w + 2);
cv::rectangle(mat_img, cv::Point2f(std::max((int)i.x - 1, 0), std::max((int)i.y - 30, 0)),
cv::Point2f(std::min((int)i.x + max_width, mat_img.cols - 1), std::min((int)i.y, mat_img.rows - 1)),
color, CV_FILLED, 8, 0);
putText(mat_img, obj_name, cv::Point2f(i.x, i.y - 10), cv::FONT_HERSHEY_COMPLEX_SMALL, 1.2, cv::Scalar(0, 0, 0), 2);
}
}
if (current_det_fps >= 0 && current_cap_fps >= 0) {
std::string fps_str = "FPS detection: " + std::to_string(current_det_fps) + " FPS capture: " + std::to_string(current_cap_fps);
putText(mat_img, fps_str, cv::Point2f(10, 20), cv::FONT_HERSHEY_COMPLEX_SMALL, 1.2, cv::Scalar(50, 255, 0), 2);
}
}
Is it possible to call the doFaceRecognition method only when a new track_id is generated at the beginning and not at every object detection and if the doFaceRecognition already has a matchingBestIndex it need not update it ?
Yes, just use:
for (auto &i : result_vec)
{
// if isn't tracked yet
if(i.track_id == 0) {
int _matchingBestIndex = doFaceRecognition(cur_frame, i);
if (_matchingBestIndex != -1)
{
i.matchingBestIndex = _matchingBestIndex;
}
}
}
@AlexeyAB Thanks !
The values are not getting updated after adding the code at this location. https://github.com/AlexeyAB/darknet/blob/eff487ba3626a39e135d13929117e04bc4cf5823/src/yolo_console_dll.cpp#L360
@AlexeyAB
Can you plz share the following ?
After how many frames are you detecting objects ? After how many frames are you updating the tracker ? Would a correlation tracker be better suited for the same ?
Thanks.
@dexception
After how many frames are you detecting objects ?
For the yolov3.cfg: yolo_console_dll.exe data/coco.names yolov3.cfg yolov3.weights test.mp4
on GPU GeForce GTX 970 - the Yolo Detector will detect for each 6th frame.
49 FPS capture / 8 FPS detection = 6
After how many frames are you updating the tracker ?
Tracker do tracking for each frame.
After that Detector have gave us results, then we update Tracker - asynchronously overtaking up to the current frame (we send to the tracker all the saved frames starting from the frame where the detection began). And then we do tracking for each current frame, until Detector will give us new results.
Would a correlation tracker be better suited for the same ?
What tracker do you want to use?
Current OpenCV Optical flow tracker is very fast multi-object tracker - up to ~1000 FPS on GTX 970.
@AlexeyAB I am using Dlib correlation tracker + dlib cnn face detector + dlib face landmark. And current doing face recognition+adding new trackers for new objects every 10th frame. After every 4th frame updating the trackers. And drawing trackers at each frame.
The face detector i trained with yolov3 and combined it with dlib landmark didn't work. It was fast but the problem with dlib is the face coordinates are little lower near to eye brows and i trained my face detector on celebA dataset. So the landmark comparison for distance was giving lot of false positives.
@dexception
As I see this is single-object tracker void start_track (const image_type& img, const drectangle& p);
? http://dlib.net/dlib/image_processing/correlation_tracker_abstract.h.html
Do you track only 1 face?
And how fast does it work?
And current doing face recognition+adding new trackers for new objects every 10th frame. After every 4th frame updating the trackers.
Did you hardcoded these values (10, 4) or does it depend on GPU/algorithms performance?
@AlexeyAB I have modified the code to track multiple faces.
std::vector
Yes hardcoded these values for testing. But will change according to fps i am getting.
25 FPS with 1 GB GPU memory and so far getting more accuracy than my earlier model.
@AlexeyAB Have you compared different posing algorithms ?
@dexception What is the posing algorithms?
I only compared detection algorithms. But I didn't search a long time for tracking algorithms, I just compared several features from main OpenCV repo: Feature+findHomography/estimateRigid, CamShift/MeanShift, Phase Correlation, Optical Flow,... so Optical Flow good and fast multi-object tracker.
@AlexeyAB Which method is deciding whether new trackers are needed and which trackers need to be erased ? I have not been able to search it in the code.
@AlexeyAB You can try cv::Ptr<cv::Tracker> tracker= cv::TrackerTLD::create(/* "TLD"*/);
from #include <opencv2/tracking.hpp>
, but the problem is that TLD algorithm is completely encapsulated in OpenCV, already included target detection.
My previous attempt to make darknet cpp-ization failed, I dont know how to use yolov3+TLD
On the other hand, deep_sort is a good algorithm too. But also has a problem that, deep_sort using a private DeepLearning-Feature( he said it generated from MOT dataset, and just given a uncertain network design in paper ) Although he does a good job of fellowing, but the matching for missing targets has a high rate of false positives (though it is better than TLD and KCF).
@TaihuLight The code has been modified heavily now. What exactly are you looking for ?
@AlexeyAB Just saw one of the trackers with face id 3 shift from one person to another person's face. I think there is some issue with the trackers. I can confirm this as a bug.
@TaihuLight
As I see cv::TrackerTLD
is still in the opencv_contrib
instead of opencv
, so it isn't well tested and API of TrackerTLD can be changed: https://github.com/opencv/opencv_contrib/blob/master/modules/tracking/include/opencv2/tracking/tracker.hpp#L1181
Also it doesn't have GPU implementation, and it is single object tracker, i.e. we should use multiple TrackerTLD
for tracking multiple objects, so it can be slow for tracking 10-100 objects:
https://github.com/opencv/opencv_contrib/blob/master/modules/tracking/src/multiTracker.hpp#L52-L53
Is it stable and does it work fast for 10 - 100 objects?
@dexception
Just saw one of the trackers with face id 3 shift from one person to another person's face. I think there is some issue with the trackers. I can confirm this as a bug.
Is it a bug, or just a low precision of tracking?
@AlexeyAB There are 2 issues i have noticed.
I would rate this as a certain bug.
The correlation tracker from DLIB does not suffer from these issues.
//calculate the center position of the object
int x_bar = x + 0.5 * w;
int y_bar = y + 0.5 * h;
for (int i = 0; i != trackers.size(); i++)
{
drectangle tracked_position = trackers[i].get_position();
int t_x = (int)tracked_position.left();
int t_y = (int)tracked_position.top();
int t_w = (int)tracked_position.width();
int t_h = (int)tracked_position.height();
//calculate the center position of the tracking object
int t_x_bar = t_x + 0.5 * t_w;
int t_y_bar = t_y + 0.5 * t_h;
if ((t_x <= x_bar <= (t_x + t_w)) && (t_y <= y_bar <= (t_y + t_h)) &&
(x <= t_x_bar <= (x + w)) && (y <= t_y_bar <= (y + h)))
{
isNewTrackerNeeded = 0;
}
else
{
//create new tracker as this is a new object not being tracked
isNewTrackerNeeded = 1;
break;
}
}
For erasing trackers:
double tracking_quality = trackers[i].update(*dlibImageGray);
if (tracking_quality < 7)
{
delete_list.push_back(i);
}
@dexception
May be it can be solved by changing these 2 lines: https://github.com/AlexeyAB/darknet/blob/b847f39f60eb6715325f3707e78667a0611811dd/src/yolo_v2_class.hpp#L73-L74 to these
YOLODLL_API std::vector<bbox_t> tracking_id(std::vector<bbox_t> cur_bbox_vec,
bool const change_history = true, int const frames_story = 1, int const max_dist = 30);
I think it can't cases with occlusions can't be solved by using Optical Flow. So there should be used other trackers: correlation, TLD, KFC, ...
May be later I will add correlation_tracker from DLIB to my example: http://dlib.net/dlib/image_processing/correlation_tracker_abstract.h.html
Or cv::TrackerTLD from OpenCV-contrib: https://github.com/opencv/opencv_contrib/blob/master/modules/tracking/include/opencv2/tracking/tracker.hpp#L1181 https://github.com/opencv/opencv_contrib/blob/master/modules/tracking/src/multiTracker.hpp#L52-L53
@dexception Your following code is private, then your permission is need for download? I have send request to me. version2 https://drive.google.com/open?id=1oj8-GdJSSqF32A-nZbOQn-cMr3FHJOPY version3 https://drive.google.com/file/d/15IDALtHAx2wAuu3RY1AinvSznLibBzVb @AlexeyAB I don't get the code for object tracking, and don't understand how to run it .
Dear Alex,
I just tested the face detection and your tracker on a live crowd using a hikvision ptz camera. It was so bad that I had to shift to stream 3 and that too configured it on low resolution with 8 fps and it still crashed because of delay. Looks like I would need a huge GPU to do real-time HD video and 4k 60 meter face recognition is totally impractical. The cost is going to be huge.
I had trained my model to detect faces and it was detecting faces but the fps was so low on a live video.
This should help:
@uday60
@AlexeyAB I have been wondering this for a while now. I need your expert advice on this.
This guy is doing object detection on integers. What algorithm is this ? The only thing that comes close to what i can think of is LBPH(Local Binary Pattern with histogram) in opencv. But it looks way too much optimized while i was running his demo it was showing 12 threads but when i played a normal video in opencv it was consuming 30 threads. So it is even opencv or some other computer vision library.
@dexception
This guy is doing object detection on integers.
How did you find out that it uses integers?
I don't know what is this algorithms, it isn't published. This isn't Neural Network, because it requires sliding window (params: scale, window size). Yes, it is very optimized.
Shenzhen Uni by Shiqi Yu - it isn't very accurate, but it is very fast 1500 FPS on multicore-CPU: http://vis-www.cs.umass.edu/fddb/results.html#rocunpub
@AlexeyAB Installed GPU-Z on the system ran his code and saw the results. LBP works on integers and is not the most accurate but is the fastest object detection in the world on CPU.
I ran the demo on grayscale images and was decent enough to detect front faces. I will train my own faces and come back with results. This is truely scalable. You can run 10 of these in your normal i7 laptop without GPU. Imagine how many streams you can handle on a 80 core server.
https://github.com/opencv/opencv/tree/master/data/lbpcascades
@dexception
It seems this is the optimal algorithm for this particular task - Face Detection. Did you find any Face Recognition algorithm that is faster/accurate than Dlib?
You can run 10 of these in your normal i7 laptop without GPU. Imagine how many streams you can handle on a 80 core server.
https://github.com/opencv/opencv/tree/master/data/lbpcascades
Do you mean that this library https://github.com/ShiqiYu/libfacedetection uses default LBP-models from opencv https://github.com/opencv/opencv/tree/master/data/lbpcascades ?
@AlexeyAB
Chinese Whispers algorithm is faster than the default code i shared earlier.
This is the default implementation of LBP in opencv. https://github.com/opencv/opencv/tree/master/data/lbpcascades ?
No he does not use the default version. https://github.com/ShiqiYu/libfacedetection
@dexception I am trying to do a similar thing for barcode recognition instead of face but having issues in linking that library(ZBar) to darknet. Could you please tell how you linked Dlib to darknet? Thanks
@Isha8
Convert the opencv rectangle to dlib rectangle:
static dlib::rectangle openCVRectToDlib(cv::Rect r) { return dlib::rectangle((long)r.tl().x, (long)r.tl().y, (long)r.br().x - 1, (long)r.br().y - 1); }
@AlexeyAB This is a faster version of the queue implementation that is currently implemented in the tracker. https://github.com/cameron314/concurrentqueue
@dexception I meant how you were able to use the dlib resources within yolo. I am not able to make darknet, what changes did you do inside the makefile?
It works now, I had added to the compiler flags instead of linker. Thanks.
@AlexeyAB
I am trying to combine the object tracking code you posted in the below issue with Dlib face recognition code. I already have trained a face detector in yolo and it is giving about 16 FPS with object tracking. https://github.com/AlexeyAB/darknet/issues/907
Dlib face recognition is slow primarily because of the face detector model they have. I want to replace it with yolov3 and object tracking.
So in the draw_boxes method inside yolo_console.cpp file. I am trying to add face recognition code from dlib....
Well the whole objective is to have a face recognition system capable of processing 25 fps. I am just wondering whether the approach i am taking is good enough. I would like your opinion on this.
Thanks !