GeekAlexis / FastMOT

High-performance multiple object tracking based on YOLO, Deep SORT, and KLT 🚀
MIT License
1.15k stars 253 forks source link

update on flow and disable ReID #35

Closed xjsxujingsong closed 3 years ago

xjsxujingsong commented 3 years ago

Thanks for your sharing. Very good MOT. I read your code about optical flow part. After 5 frames, it will do detection again and associate tracks with new detections. The Kalman filter states will be updated with detection window (mean, cov = self.kf.update(*track.state, det.tlbr, MeasType.DETECTOR)) called by function: self.tracker.update(self.frame_count, detections, embeddings) But it looks like for optical flow it still use the old keypoints. Should we re-generate keypoints in the new detection window because sometimes there are large gaps between the optical flow window and detection window.

xjsxujingsong commented 3 years ago

I am currently trying to run your code in Windows by using public detections and disabling feature embedding. Because there is no way to use python version tensorrt in Windows. I tried to call your onnx model with OpenCV, but it fails with \opencv2/dnn/shape_utils.hpp:171: error: (-215:Assertion failed) start <= (int)shape.size() && end <= (int)shape.size() && start <= end in function 'cv::dnn::dnn4_v20200609::total'

blob = cv.dnn.blobFromImages(images, 1, (128, 256)) net.setInput(blob) out = net.forward()

GeekAlexis commented 3 years ago

@xjsxujingsong Keypoints outside the bounding box are filtered out after kalman filter update. So there is no need to regenerate keypoints. The feature extractor model has a batch size of 16 so you need to take care of that. Feeding in a single image won't work.

xjsxujingsong commented 3 years ago

@GeekAlexis Thanks for reply.

For the 1st question, what I mention is after new detection comes in 5th frame, it will be used to associate existing teacks and generate new tracks and remove dead tracks if required. Meantime, if any track is updated, its location (detection box) will be used to update kalman status. This has been implemented in your code. My question is we should regenerate keypoints based on the detection box rather than still use the old keypoints. Although those old keypoints are still in the image, its tracked box is not the actual location (detection box).

For the second question, i did try.both batch size.1 and 16. Neither worked, but gave different errors.

GeekAlexis commented 3 years ago

@xjsxujingsong I understand your question. Keep in mind that the tracked box is updated using the detection box. Only the keypoints inside the updated box are kept. So the keypoints do correspond to the actual location (updated box), it shouldn't matter if they are extracted from previous frames or not.

Your second question is probably related to OpenCV: https://github.com/opencv/opencv/issues/17063.

xjsxujingsong commented 3 years ago

@GeekAlexis I see. The keypoints will be in the intersection of old tracked box and new detection box. If the number of keypoints are low, new keypoints will be re generated based on detection box.

I had this question because when I put all zero for embedding, (only iou will be used for association) i noticed that after the 5th frame, the detection box (white) is correct, but the tracked box is far away from it (there are still overlap). I will debug to see what is going on here.

GeekAlexis commented 3 years ago

@xjsxujingsong That might be normal because optical flow is not as accurate as detections, especially with highly textured background. Optical flow will fail and make the tracked box drift away. Decreasing detector frame skip will help in this case. Also, if you set embeddings to zero, you may want to disable ReID association too.

xjsxujingsong commented 3 years ago

@GeekAlexis Yes. That is why I may suggest regenerate keypoints after detection is available.

GeekAlexis commented 3 years ago

@xjsxujingsong Thanks for your suggestion. Will test it when I have time.

GeekAlexis commented 3 years ago

@xjsxujingsong I couldn't justify refreshing keypoints at least on the MOT dataset. Slight improvement on MOTA but more ID switches. But it might be helpful in your specific case. If you can share your input video I can further investigate it.

xjsxujingsong commented 3 years ago

@GeekAlexis sorry for late reply. You can reach me by wechat: cauthy. This is strange. As far as I know, the keypoints will be in the intersection of old tracked box and new detection box. So the number of keypoints will decrease gradually. If we re-generate keypoints when new detection box is available, at least the number of keypoints will be large. Not sure why it causes more ID switch. If the performance drops, does it mean the old keypoints are better than newly generated?

GeekAlexis commented 3 years ago

@xjsxujingsong I suspect the old keypoints are more likely to be on the actual target than the newly generated keypoints, especially when two boxes overlap or if there are occlusions, which cause a bit more ID switches. But if there is a case that this works better, then I will update.

xjsxujingsong commented 3 years ago

@GeekAlexis that is possible. Another thing is about feature update. I checked deepsort, fairmot, they will update appearance feature gradually. I suspect that we should hold on if any occlusion exists. I did people counting before, when any two bounding boxes overlap, I will stop updating the feature. It can reduce id switch greatly.

GeekAlexis commented 3 years ago

@xjsxujingsong I have tried that before, but I didn't get any significant improvement. Computing pairwise IOU can be expensive too. Are you using a history of features or a running average (like FairMOT and this work) to match features?

xjsxujingsong commented 3 years ago

@GeekAlexis My task is people counting, not MOT. Normally it is short-term tracking (e.g. a few seconds) which may be different from the MOT dataset. I tried FairMOT which will take half body or even head only as person. So I finally used yolov3 as detector and deep sort as MOT. I found that when two people come from two directions, ID switch all the time when overlap exists. So I disabled appearance feature update and get higher counting accuracy. I just checked deepsort which used a history of features (nn_budget).

PiyalGeorge commented 3 years ago

@GeekAlexis , A doubt. So now the reidentication is also implemented here. I wish to know: Is it possible to disable 'reidentify'. For example: If i want only the 'Detection + Tracking' with this speed in Xavier-NX, what should i do?

GeekAlexis commented 3 years ago

@PiyalGeorge Get rid of these lines: https://github.com/GeekAlexis/FastMOT/blob/090c8ae357f143658fc81b1059060263105734e8/fastmot/tracker.py#L168-L169 https://github.com/GeekAlexis/FastMOT/blob/090c8ae357f143658fc81b1059060263105734e8/fastmot/tracker.py#L194-L203

PiyalGeorge commented 3 years ago

@GeekAlexis , Thanks. That worked!