GeekAlexis / FastMOT

High-performance multiple object tracking based on YOLO, Deep SORT, and KLT 🚀
MIT License
1.14k stars 254 forks source link

Tracking Small Objects at High Speed #91

Closed levipereira closed 3 years ago

levipereira commented 3 years ago

Hi @GeekAlexis, I loved your project and included it in my studies. I'm doing vehicle tracking and calculating real-time speed among other types of analysis. As a crucial part of my project it includes Object Detection and Tracking. FastMOT is perfect for starting my project. I've done all Yolov4 Darknet Detection training and I'm going to do Fast-REID Training. I did some tests with aerial video images and I have some questions. I used it as Feature Extractor OSNet025, apparently it works very well.

However, small objects at high speed, the object changes its identity very often, I believe this is due to KLT optical flow tracking.

Do you have any clue as to how I can fix this?

Check Videos Output using:

detector_frame_skip = 5

and

detector_frame_skip = 1

Output Video Here

GeekAlexis commented 3 years ago

The default config is for slow pedestrian tracking so you can play with these parameters described in https://github.com/GeekAlexis/FastMOT/issues/76

Let me know if it works.

levipereira commented 3 years ago

Yes, it works. I need understand what these parameters means, but will figure out.

"kalman_filter": {
                "std_factor_acc": 20.5, 
                "std_offset_acc": 100.5,

I need to do fine tuning to calibrate it, but you showed me the way.

I'm using generic feature extractor model. The REID is reusing same ID with different vehicles. What parameter I can change to avoid this. I have tried play with below parameter but no success.

"multi_tracker": {
            "max_age": 15,
            "age_factor": 14,

Video with new parameters. Output Video Here

GeekAlexis commented 3 years ago

OSNet is not accurate on vehicles. You can lower max ReID cost to make ReID stricter: https://github.com/GeekAlexis/FastMOT/blob/faf36a8e51d36ebee9817ed1006c3f6881bcbde6/cfg/mot.json#L45

max_age and age_factor are not related to this.

For fast moving vehicles you can also increase the half life period (in seconds) for velocity decay (maybe 10): https://github.com/GeekAlexis/FastMOT/blob/faf36a8e51d36ebee9817ed1006c3f6881bcbde6/cfg/mot.json#L61

levipereira commented 3 years ago

This worked.

Will start training Fast-REID to vehicle to get better accuracy on REID.

I'm running on i7-8700 (6 cores) and RTX 2060

CPU stuck in 100% busy, but video still at ~90 fps and no freeze. GPU get 25% busy.

Will try to find root cause of CPU get 100%.

Thank you for your time.

levipereira commented 3 years ago

Starting Docker with theses two Parameters, Increase perfomance to 100fps and 70% CPU busy.

-e OPENBLAS_MAIN_FREE=1 -e OPENBLAS_NUM_THREADS=1
GeekAlexis commented 3 years ago

Thanks. Good to know. Should these be in included in the dockerfile? Why would FPS improve with less threads?

On Tue, Apr 27, 2021 at 2:33 PM Levi Pereira @.***> wrote:

Starting Docker with theses two Parameters, Increase perfomance to 100fps and 70% CPU busy.

-e OPENBLAS_MAIN_FREE=1 -e OPENBLAS_NUM_THREADS=1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GeekAlexis/FastMOT/issues/91#issuecomment-827945495, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGNJO5QBBRQXDBLYRB2IVMLTK4USNANCNFSM43VK6GDQ .

levipereira commented 3 years ago

It's a numpy issue. This should be recommended if user get 100% CPU usage running a simple code. I already use Deep Sort and and faced same issue and fixed the issue using both variables.

https://stackoverflow.com/questions/38659217/numpy-suddenly-uses-all-cpus https://shahhj.wordpress.com/2013/10/27/numpy-and-blas-no-problemo/

Why would FPS improve with less threads?

Numpy on DeepSort code have poor behavior when using multiprocessing. Setting this Variables numpy have better performance than without it.

levipereira commented 3 years ago

I'm compiling ATLAS with LAPACK and Linking with Numpy. I'll check perfomance impact with this setup.

There is some flag to measure execution time spent by DeepSort?

GeekAlexis commented 3 years ago

You can use the verbose -v flag and check association time in seconds at the end.

GeekAlexis commented 3 years ago

Any update?

levipereira commented 3 years ago

Hi Alexis, I'm pretty busy these days with some datasets and I haven't had time for this problem. But it is in my notes to try to solve this problem in the next few days. I managed to compile the Atlas, but it is not the best thing to do because there are already more recent libraries such as OpenBlas that should be used. I need to debug the code and see which part of the code is responsible for the high consumption of resources.

Another question is: I'm trying to REID People and Cars. At REID I will need a model for people and another for cars. At YOLO I can detect people and cars at once. But to extract features I will need 2 Inferences, one for cars and another for people. I need direction about how we can implement this, if you help me with ideas I will be grateful.

GeekAlexis commented 3 years ago

No rush. Thanks for investigating the issue.

But to extract features I will need 2 Inferences, one for cars and another for people. I need direction about how we can implement this, if you help me with ideas I will be grateful.

The tracker logic currently associates all detection from different classes at once. You can split it up and associate by class, which requires the most change. You also need to create two FeatureExtractor instances and extract ReID features by class but this might be slow. FPS wise, it's recommended to train your ReID network on both classes if possible.

levipereira commented 3 years ago

The tracker logic currently associates all detection from different classes at once. You can split it up and associate by class, which requires the most change. Or you can create two FeatureExtractor instances and extract ReID features by class but this might be slow. FPS wise, it's recommended to train your ReID network on both classes if possible.

Ty for feedback.

I think the most rasonable solution is split classes due perfomance issues.

https://github.com/JDAI-CV/fast-reid/issues/329#issuecomment-751968321

I know it is a big change but it is a price that I will have to pay.

GeekAlexis commented 3 years ago

For more classes, it's not a scalable solution. I don't think model input size is an issue because images of different resolutions have to be resized anyway. You can use 256x256 for both person and vehicle ReID. Since you are running on a desktop and only have two classes, it's fine to split by class.

I suggest you create two FeatureExtractor instances, split the detections from YOLO by class, and feed them into the corresponding FeatureExtractor. Finally, concatenate the feature vector output at the end. This is probably the easiest way and you don't have to modify the association step.

GeekAlexis commented 3 years ago

FYI, the recent commit e33596afc8d8bb9665177415dc47e85298954c48 includes the two OpenBlas flags in the dockerfile. This significantly improves FPS by more than 2x.

GeekAlexis commented 3 years ago

For those that visit this issue in the future, these two parameters should be raised first if tracking fast small objects doesn't work: https://github.com/GeekAlexis/FastMOT/blob/2b0e531009d716994230a995ac783c85f728c392/cfg/mot.json#L58-L59

GeekAlexis commented 3 years ago

Closing this now since the original issue is resolved. Feel free open a new issue for other questions.

levipereira commented 3 years ago

Hi @GeekAlexis, I found the root cause of my CPU get stuck in 100%.

Issues:

  1. solved Part of the problem was that numpy conflict with OpenBLAS multi-threading. So the best option was disable multi-threading in Openblas. https://github.com/xianyi/OpenBLAS/wiki/faq#multi-threaded

  2. The fastmot resize the input frame in CPU here VideoIO: I was processing videos of 4K resolution (3840 x 2160/60fps) The CPU i7-8700 can handle it but get stuck in 100%.

We can improve this piece of code by encode,decode and resizing in GPU instead CPU.

Starting with OpenCV 4.5.2 new properties are added to control H/W acceleration modes for video decoding and encoding tasks

https://github.com/opencv/opencv/wiki/Video-IO-hardware-acceleration

As you are using gstreamer pipeline, we can explorer nvcodec in gstreamer https://gstreamer.freedesktop.org/documentation/nvcodec/index.html?gi-language=c

I'm building OPENCV 4.5.3 with ffpmeg 4.4 compiled with NVDEC.
P.S do not use the master branch of FFMPEG due a bug with OPENCV.

About ffmpeg and nvdec https://docs.nvidia.com/video-technologies/video-codec-sdk/ffmpeg-with-nvidia-gpu/

Performance of decode/encode over NVDEC https://developer.nvidia.com/nvidia-video-codec-sdk

Just be aware that Non-Enterprise GPU Cards have limited number of NVDEC and Concurrent sessions https://developer.nvidia.com/video-encode-and-decode-gpu-support-matrix-new

This can really improve performance of fastmot in case of high resolutions decode/resize/encode.

Cheers, Levi

GeekAlexis commented 3 years ago

@levipereira The environment can be hard to set up with nvcodec. It would be great if you can provide a working version of Dockerfile with GPU accelerated FFMPEG.

levipereira commented 3 years ago

It's work and is fast to decode/encode, my next step is test decode and resize in gpu. This can realy help us. My dockerfile is a mess, I need clean it before send you. I'll try build a small image with opencv and ffmpeg and gstreamer with H/W acceleration enabled

levipereira commented 3 years ago

Hi @GeekAlexis , Bad news... I did tons of tests using OPENCV with ffmpeg NVDEC with RTX 2060 and I could get only 65 Fps (70%GPU utilization) with 4k video while in CPU i7-8700 i got 170 FPS (100% utilization). Using only FFMPEG and NVDEC i got 220Fps. I think there is some issue between ffmpeg and opencv when using HA, because that I think is not good implement for now OPENCV with NVDEC, but I have not tested ENCODE (NVENC).

These libraries seems to be interesting.

https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html https://docs.nvidia.com/jetson/l4t-multimedia/group__LibargusAPI.html

noorafathima7 commented 3 years ago

Hi @levipereira , could you please share the code for splitting the reid for two classes? also pls let me know the performance.

GeekAlexis commented 3 years ago

@noorafathima7 The feature is already supported on the master branch. You can add multiple ReID models for different classes. FPS will drop a bit compared to a single ReID model.

noorafathima7 commented 3 years ago

@GeekAlexis , Thanks for the reply. my intention is to track car, bus, truck, bicycle and human. is 2 reid models enough? like one for vehicles (for eg: veri wild) and the osnet reid model for human. if yes, how will the model associate the tracking id to the detected object since it can identify only 2 classes,ie, vehicle and human.

GeekAlexis commented 3 years ago

In that case I suggest you split the detections between vehicles and human so that you can use only 2 reID models. This only works if all your vehicle class IDs are consecutive.

Assuming class IDs are in the same order you listed them, you can write a custom find_split_indices() to split between bicycle and human here: https://github.com/GeekAlexis/FastMOT/blob/9aee101b1ac83a5fea8cece1f8cfda8030adb743/fastmot/mot.py#L146-L148

Also get rid of the assertion: https://github.com/GeekAlexis/FastMOT/blob/9aee101b1ac83a5fea8cece1f8cfda8030adb743/fastmot/mot.py#L84-L85

FastMOT does not associate objects with different classes unless you remap different class IDs to the same one.

noorafathima7 commented 3 years ago

Thank you! will try this.

noorafathima7 commented 3 years ago

Hi @GeekAlexis, The custom find_split_indices() function you mentioned above is to select the feature extractor model based on the detections right?.

my class ids are in this order: 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck',

GeekAlexis commented 3 years ago

correct, you can split at class ID 1 so the rest of the classes will use the vehicle feature extractor