Open keighrim opened 9 months ago
Another data point;
$ ffmpeg -i /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4
...
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4':
Metadata:
major_brand : isom
minor_version : 512
compatible_brands: isomiso2avc1mp41
encoder : Lavf56.4.101
Duration: 00:59:25.22, start: 0.000000, bitrate: 830 kb/s
Stream #0:0[0x1](und): Video: h264 (High) (avc1 / 0x31637661), none(progressive), 480x360, 694 kb/s, 29.97 fps, 29.97 tbr, 60k tbn (default)
Metadata:
handler_name : VideoHandler
vendor_id : [0][0][0][0]
Stream #0:1[0x2](und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 128 kb/s (default)
Metadata:
handler_name : SoundHandler
vendor_id : [0][0][0][0]
then, running this video to SWT @ develop, with all default params,
2024-02-10 04:27:52 http://apps.clams.ai/swt-detection/unresolvable INFO 140602549245504 Processing video d1 at /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4
2024-02-10 04:27:52 http://apps.clams.ai/swt-detection/unresolvable INFO 140602549245504 processing /mnt/llc/llc_data/clams/wgbh/NewsHour/WN0015/cpb-aacip-507-154dn40c26.mp4...
2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO 140602549245504 Featurizing time: 166.46 seconds
2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO 140602549245504 Classifier time: 1.53 seconds
2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO 140602549245504 Extract time: 1.76 seconds
2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO 140602549245504 Seeking time: 37.90 seconds
2024-02-10 04:31:37 http://apps.clams.ai/swt-detection/unresolvable INFO 140602549245504 Processing took 224.91390013694763 seconds
2024-02-10 04:31:37 werkzeug INFO 140602549245504 127.0.0.1 - - [10/Feb/2024 04:31:37] "POST /?pretty HTTP/1.1" 200 -
So for NH videos, it takes up to 5 mins to process the whole (that's RTx12). Does the Peabody video has something special? Does stopAt
parameter do something wonky?
Interesting. Here is another data point.
On my 9 year old iMac (no usable GPU) I get the following on a 10 second video:
total time: 11.32s seek time: 0.59s
So that would be less than a second seek time for 10 seconds of video, versus the 6 seconds you see for a 12 second video.
And for what it's worth, at work on my iMac Pro total processing time for a one hour video is about 15 minutes.
Does stopAt parameter do something wonky?
Maybe, but all it should be doing is break out of a loop:
Another data point from a fresh experiment.
2024-02-29 19:02:25,869 - INFO - Initiating classifier with /home/krim/Projects/clams/clamsproject/app-swt-detection/modeling/models/20240212-131937.convnext_tiny.kfold_000.pt
2024-02-29 19:02:26,655 - INFO - Classifier initiation took 0.7860872745513916 seconds
2024-02-29 19:02:26,656 - INFO - Minimum time frame score: 0.5
2024-02-29 19:02:26,657 - INFO - Processing video d1 at /home/krim/cpb-aacip-259-c53f255d.mp4
2024-02-29 19:04:06,317 - INFO - Featurizing time: 35.16 seconds
2024-02-29 19:04:06,328 - INFO - Classifier time: 1.20 seconds
2024-02-29 19:04:06,329 - INFO - Extract time: 0.87 seconds
2024-02-29 19:04:06,330 - INFO - Seeking time: 53.57 seconds
2024-02-29 19:04:06,331 - INFO - Processing took 99.6482183933258 seconds
2024-02-29 19:04:07,520 - INFO - Initiating classifier with /home/krim/Projects/clams/clamsproject/app-swt-detection/modeling/models/20240212-132306.convnext_lg.kfold_000.pt
2024-02-29 19:04:11,093 - INFO - Classifier initiation took 3.5734410285949707 seconds
2024-02-29 19:04:11,097 - INFO - Minimum time frame score: 0.5
2024-02-29 19:04:11,098 - INFO - Processing video d1 at /home/krim/cpb-aacip-259-c53f255d.mp4
2024-02-29 19:06:43,625 - INFO - Featurizing time: 87.94 seconds
2024-02-29 19:06:43,626 - INFO - Classifier time: 1.26 seconds
2024-02-29 19:06:43,627 - INFO - Extract time: 0.87 seconds
2024-02-29 19:06:43,628 - INFO - Seeking time: 53.59 seconds
2024-02-29 19:06:43,629 - INFO - Processing took 152.51135778427124 seconds
Ran on a desktop GPU machine. For this file (cpb-aacip-259-c53f255d
), seeking time still is a significant portion of the runtime both with tiny and lg backbones. In term of CNN application time, tiny and lg models took 35 secs, 88 secs, respectively.
I will keep experimenting with other files (newhours and other videos @owencking shared ealier this week)
Ran some additional experiments over the weekend, and here are the results;
https://gist.github.com/keighrim/fac746647af02c039c1dbcca2a27db6d
Some findings from simple eyeballing over the numbers;
With a new implementation of frame image extraction in #86 , (although I haven't accurately measure the timing for a number of videos yet) the seeking + extraction time for 2 videos I was experimenting with is seen significantly reduced. (50sec > 22sec, 54sec > 24sec)
Bug Description
Classifier
class hasprocess_video
that iterates video frame by frame (with a sample rate) and extract image, feature vectors, run classification. We found that the method is very slow, event on GPU. I did some manual profiling, and I believe we should work on improving video seeking code.Reproduction steps
With
logging says;
Hmm, that's real-time. This was measured on a GPU machine, and GPU utilization was confirmed.
So I wrote a quick and simple profiling
And with the same
curl
call, I gotSo it turns out "moving cursor" in the video is taking as much time as the complex CNN feature extraction.
The cv2 instance I was using was compiled with ffmpeg, and to my knowledge, seeking to a keyframe is a constant-time operation (correct me if I'm wrong, please).
I can only guess that using
POS_MSEC
instead of frame number adds a big overhead. Probably further experiment is needed.Expected behavior
No response
Screenshots
No response
Additional context
https://forum.opencv.org/t/cap-prop-pos-frames-is-abnormal-slow/11651/3 (this is possibly related to #67 too)
video file info