OpenDataCam: 100% CPU usage using ./uselib

rantgithub commented 4 years ago

Hi

Testing the new version of opendatacam, we have noticed that when using the uselib api, the CPU remains at 100% causing machine to be very slow.

Using the darknet process the amount of GPU in usage is higher (typically) but CPU is low and when the uselib is used GPU memory decreases but CPU goes and stay to 100%.

Any suggestion on what can be done/check to prevent this ?

perf2 perf3 perf4

Thank you

AlexeyAB commented 4 years ago

As I see uselib occupies only 1 CPU Core at 100%, so average CPU usage is about 60-70%. So this is not critical.

Testing the new version of opendatacam, we have noticed that when using the uselib api, the CPU remains at 100% causing machine to be very slow.

Do you get the same issue in both cases? if you use

./uselib ....
opendatacam

When you use ./uselib what FPS-capture, FPS-detection, FPS-tracking do you get?

rantgithub commented 4 years ago

Hi

1-2 The average as you mentioned is 60-70%, but is only for 1 instance. If I put 2 instances, CPU goes above 90% steady making the machine very slow. 3 instances crashes the machine.

For opendatacam, if I use darknet process, the CPU remains at around 30% for 1 instance and 70% maximum for 3 instances. There is a big difference on CPU usage between darknet and uselib.

I am using the standard 416x416 on the config file and getting around 40-70 FPS.

I should say on darknet and uselib I get the same FPS, but uselib is clearly using much more CPU than darknet process.

AlexeyAB commented 4 years ago

./uselib is made for low-latency detection: capture->preprocessing->detection->postprocessing->showing->send_json, that's why it uses much more CPU.

So:

for high bandwidth (many streams) - use ./darknet
for low latency (1 stream) - use ./uselib

Does the opendatacam give you access to two video streams?

pass-through default darknet-videostream :8090 port? (multi-colored bounded boxes)
its own videostream :8080 port? (dotted white bounded boxes)

60489282-3f2d1700-9ca4-11e9-932c-19bf84e04f9a

rantgithub commented 4 years ago

ODC only allows 1 video/stream per instance, however I can run multiple instances on the same machine.

The current version uses darknet but there are some issue on jetson devices. The new version V.3 (in testing) was changed to use uselib to address these issues.

The issue now, is high CPU usage that I asked and you have answered based on your explanation.

is anything that can possible be tweaked for uselib ? to reduce cpu usage ?

AlexeyAB commented 4 years ago

Try to change 3 to 30 in this line and recompile: https://github.com/AlexeyAB/darknet/blob/a9bae4f0326b4a841756d5b1a6ed37821f6a9467/src/yolo_console_dll.cpp#L251

rantgithub commented 4 years ago

This did the trick

Same video, same everything, but now CPU is around 50% steady

Question, what is the highest value you think can be used here, considering more running instances ?

uselib_with_30

AlexeyAB commented 4 years ago

I think the highest value 1000 / FPS.

Can you show CPU utilization for Darknet for the same task?

rantgithub commented 4 years ago

Your calculation is correct

I ran several tests and looks like 15 is the min

this is the table using uselib

Parameter 50 -20 FPS - CPU 45 Parameter 40 - 25 FPS CPU 45 Parameter 30 - 33 FPS CPU 47 Parameter 20 - 50 FPS CPU 50-55 Parameter 15 - 67 FPS CPU 60-75% Parameter 12 - 83 FPS CPU 60-75% but 1 core hit 100% steady Parameter 10 - 75-100 - CPU 60 but 1 core hit 100% steady

rantgithub commented 4 years ago

This is the CPU usage using darknet, same video, same parameters

darknet-416

AlexeyAB commented 4 years ago

So currently CPU usage is the same for both ./darknet and ./uselib ?

rantgithub commented 4 years ago

yes

I should say until 20, below 20 the CPU start to go up.

V3 is still in dev/testing

Once we have all V3, i can provide more feedback with more instances and compare the perf against darknet but for now this works because keeps cpu stable.

rantgithub commented 4 years ago

I have a quick question, not exactly directly related to this. On your documentation you mentioned that if you increase the size of network from 416x416 on the cfg file to 800x800 or higher the accuracy should improve. My models are trained on 800x800 or 608x608 most of the time I worked with a lot of very low resolution videos and most of them 320x240 and what i found that to detect small objects inside of this videos, I have to go into very low area like 224x224, otherwise with 416, 608 or 800 recognition is very poor. Can you please comment on what should be the correct/suggested configuration to recognizes objects in these kind of conditions Thank you

AlexeyAB commented 4 years ago

You should use network resolution the same or lower than test image resolution
Relative sizes of objects should be the same on your training and test images. Otherwise you should follow this rule: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:

train_network_width train_obj_width / train_image_width ~= detection_network_width detection_obj_width / detection_image_width train_network_height train_obj_height / train_image_height ~= detection_network_height detection_obj_height / detection_image_height I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size:

object width in percent from Training dataset ~= object width in percent from Test dataset

That is, if only objects that occupied 80-90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1-10% of the image.

rantgithub commented 4 years ago

Ahh ok, that is why you suggest to train on a bigger resolution.

One more question, on this regards,

when I have objects that are very similar, with small differences:

for instance I have a truck#1 and a truck#2 with attached cooler unit, both are the same color, size, form, but once has one small thing than the other doesnt.

what is the best way to train the model/adjust the network to improve the classification based on these differences?

AlexeyAB commented 4 years ago

Just train as usual. If you want to distinguish these 2 classes, train as separate 2 classes. Check that this thing (attached cooler unit) is visible after resizing your image to the network size.

rantgithub commented 4 years ago

Let me summarize to be sure i got this right.

If I use the default v3 coco data, the default value for network is 416x416
if i get a video that is 800x800 resolution, i can use any number under 800x800 and/or keep the 416x416 on the network
if i get a video that is 256x256 resolution, i should change the network from 416x416 to 256x256 or lower.
If a train an image on network 800x800 and I change the network size to 416x416, the obejct should be still visible to be recognized when i change the resolution of it from 800x800 to 416x416.

AlexeyAB commented 4 years ago

@rantgithub Yes.

rantgithub commented 4 years ago

I have a quick question

if you refere to previous chart (attached here) for oopendatacam odc

For the project in opendatacam V2, the parameters for the ports from darknet to application were passed to the detector like

darknet -json_stream 8090 -mjpeg_stream 8070

but now on V3 using uselib the port s not passed as parameter (using the default and only one)

./uselib

To have the same functionally as on version 3, the ports should passed as parameters for uselib, to have something like:

./uselib -json_stream 8090 -mjpeg_stream 8070

can you please just give us your input on the best way to do this to guarantee uselib consistency or not breaking another inside module doing this

Thank you.

AlexeyAB commented 4 years ago

Do you want to use parameters -json_stream 8090 -mjpeg_stream 8070 for ./uselib? OpenDataCam does not work correctly without these parameters?

rantgithub commented 4 years ago

Hi

Yes, we want to use the ports with uselib.

8070, 8080, 8090 are the defaults ports opendatacam uses, but these are configurable and yes are the core of the application (as you can see on the diagram)

but 1 set of ports is valid only for 1 instance, if we want to use another instance we use a different set of ports. (8000, 9000, 10000 for example), in this way we pass these ports like

./uselib -json_stream 8090 -mjpeg_stream 8070 :

./uselib -json_stream 18090 -mjpeg_stream 18070:

we are doing the same right now with darknet and the idea is to use the same format with uselib.

hope this clarifies

rantgithub commented 4 years ago

Hi Doing testing of ODC with realtime m3u8 streams, I have found that there is a good number of missing/skip frames that makes the video kind of fragmented. If I have a car on frame 10, 11, 12, the car just disappear on frame 20, because there is no frame 13, 14, etc. Therefore the tracking or counting process is incorrect, due to the missing objects that is due to the missing frames. If I stream the m3u8, on VLC for instance, and from VLC I feed darknet/odc, there are not missing frames. The same behavior is observed is I get the m3u8, save ii as .mp4 file and play the file, all the frames are fine. This only happens with direct m3u8 feeds. It looks like the extra processing is causing this delay. According to ODC, it can process until 100FPS without any issue and wondering is this is an issue coming from darknet or if there is a way to have/create/extend a buffer to avoid these missing frames.

AlexeyAB commented 4 years ago

If I have a car on frame 10, 11, 12, the car just disappear on frame 20, because there is no frame 13, 14, etc.

This is normal for real-time detection.

Can you show video example of this issue?
How can we detect objects on 13, 14... frames if we don't have enough GPU-resources?
Should we use constantly growing buffer of queued frames and get an error out of memory?
Should we write 13, 14... frames to the videofile without bboxes or with old bboxes?

If I stream the m3u8, on VLC for instance, and from VLC I feed darknet/odc,

How do you do this?

. The same behavior is observed is I get the m3u8, save ii as .mp4 file and play the file, all the frames are fine.

Because Darknet can detect not in real-time.

Try to use optical-flow tracker - un-comment this line: https://github.com/AlexeyAB/darknet/blob/3d9aa2af4718a3bd5bbe23de2022987cb767e9c5/src/yolo_console_dll.cpp#L16
Try to use optical-flow tracker on GPU, compile OpenCV with CUDA and un-comment also this line: https://github.com/AlexeyAB/darknet/blob/3d9aa2af4718a3bd5bbe23de2022987cb767e9c5/src/yolo_console_dll.cpp#L17

rantgithub commented 4 years ago

Hi

I have 2 RTX with 48GIG of GPU and I even when is plenty of GPU left this still happens.
If you open VLC and press CRTL+N, add the m3u8 link , select stream, HTTP, select a port, and from darknet you connect to x.x.x.x:port, that is coming from VLC.
just to clarify, if I save it the file using VLC as mp4, using the same procedure as above, but instead of streaming save the file, and if you play the file , you dont have missing frames. The missing frames only happens with direct m3u8, ip, or direct feeds. Let me create a video and you will see what i mean.

rantgithub commented 4 years ago

To clarify, from your suggestions I should try 1 and 2 together, or 1 or 2?

AlexeyAB commented 4 years ago

To clarify, from your suggestions I should try 1 and 2 together, or 1 or 2?

Try to use or 1 or 1+2 together (preferably)

you open VLC and press CRTL+N, add the m3u8 link , select stream, HTTP, select a port, and from darknet you connect to x.x.x.x:port, that is coming from VLC.

When you use this approach for a long time is there a video lag in a few seconds or minutes, are there a rejected frames, or is there a constant increase in memory consumption?

rantgithub commented 4 years ago

i should say, there is not really lag, meaning VLC is doing a kind of buffering, that will show you the video some seconds behind of the realtime feed, but the video/detection shows continuous

No i dont see any issue with memory consumption.

That is why I asked, if there is a kind of buffer that can be increased, because from observation on how all behaves with VLC, having this kind of buffering makes all looks and works "smooth".

With ffmpeg you can bo the same that with VLC, but ffmpeg is much more intensive CPU and you can see spikes on cpu or memory issues.

rantgithub commented 4 years ago

i am running test with this now question for your comments "// It makes sense only for video-Camera (not for video-File) // To use - uncomment the following line. Optical-flow is supported only by OpenCV 3.x - 4.x //#define TRACK_OPTFLOW //#define GPU"

this means/suggest i have to have a darknet compiled for videos and another for files?

AlexeyAB commented 4 years ago

Ok try to compile without Optical flow tracker. I.e. commandt these lines //#define TRACK_OPTFLOW //#define GPU"

And add this line cap.set(CV_CAP_PROP_BUFFERSIZE, 10); between these 2 lines: https://github.com/AlexeyAB/darknet/blob/3d9aa2af4718a3bd5bbe23de2022987cb767e9c5/src/yolo_console_dll.cpp#L360-L361

rantgithub commented 4 years ago

i am testing this now question, how big i can increase the buffer? here is a sample same stream one file is been play with vlc , the other with ocv + cuda + opendatacam with darknet sometimes the video gets frozen as long as 10 seconds and continues

vlc.zip

rantgithub commented 4 years ago

and to clarify, if you save the stream like i do for this sample in a file, and you play the file, no frame is lost and nothing freezes all works fine.

rantgithub commented 4 years ago

Hi I tested several streams and I found small improvement, but the missing frames and pause between frames still happens. Researching about this, I see this is a common issue and re streaming appears the answer with 30sec been the time most applications use. any additional input you can think?

rantgithub commented 4 years ago

Hi

After many tests, I found that the problem is coming when m3u8 and you can repro the issue for instance using the public m3u8 list.

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights https://strmr5.sha.maryland.gov/rtplive/e301515800f700d700437a45351f0214/chunklist_w1467774930.m3u8

If you see, a image is detected and plays for a couple of seconds and stops for 2-6 seconds and continues. This is reproducible almost with any m3u8 link sometimes with long delays. (videos work fine, no issues)

Looking around i found that is something was already asked/posted :

https://medium.com/@v.tesin/taking-on-border-jam-with-yolov3-part-1-c49b2cb21135

and

https://github.com/vtesin/PlayingM3u8

the author mentioned sent a pull request for this.

Just checking if this was done at sometime and/or if for m3u8 I have configure/compile darknet different to avoid these delays.

I am working with many realtime feeds and although the detection is good, the delays is something affect the analysis/operation.

Any input highly appreciated.

thank you

AlexeyAB / darknet

OpenDataCam: 100% CPU usage using ./uselib #5170