Questions about VFR video decoding

Renzzauw commented 2 years ago

Hi!

Thanks for the continuous development of DALI, it has been very useful at speeding up our video decoding & inference pipeline!

I would like to support VFR videos, since we will be receiving these in the near future.

I noticed various issues have been opened about this and that you were planning to support VFR video decoding in the future, so I was wondering if there is any update or time indication on when we can expect this to be supported?

When you run the DALI video decoder with the skip_vfr_check set to true, it will hang the pipeline in case of a VFR video. Is there any way to catch this as an exception instead of it deadlocking?

In the meantime, I would like to catch VFR videos before running our pipeline and maybe transcode them to CFR (perhaps there is a much more clever or faster solution). Maybe this is not the right place to ask, but I was not able to find many resources about this topic, but what would be a good approach for detecting/catching VFR videos? I've checked out the VFR check built into DALI, but for my use-case it gave too many false positives, so I am looking for alternatives.

I also looked into FFmpeg, which has the vfrdet feature, but this is also not a waterproof solution, since it has various caveats. Is there any (more) waterproof ways to check for this, or is there always a margin of error?

Thanks in advance!

awolant commented 2 years ago

Hello, thanks for the question.

Work on supporting VFR videos is ongoing. We have new VideoReaderDecoder in experimental module that can decode VFR videos both on the GPU and CPU. Unfortunately, it is not yet ready to be used in any kind of production setup. I strongly advise against it at this point as it has some performance issues. Also it does not yest support all features that are present in current VideoReader.

With that said, I would be very grateful if you could experiment with it a little bit and give some feedback based on your particular use case. I'm very keen to hear if it works correctly on your videos. Also maybe there are some features that you would like to suggest for it? Every bit of feedback that you can share is very welcome and appreciated. It will help me to make it better and more useful for the users.

With regards to the VFR detection question. Based on my understanding the only truly waterproof way to detect if the video is VFR is to decode it and check the timestamps on the frames. As you pointed out, there are some heuristics but for general case where there are no assumptions about the videos this is the only way I know.

Hope this was helpful.

Renzzauw commented 2 years ago

Hey @awolant, thanks for your reply.

Great to hear the support for VFR is still ongoing!

I'll explain my use case first. My use case is the decoding of (batches of) video frames, pre-processing them for Yolov5 object detection, feeding the frames to a PyTorch Tensor object (using the nvidia.dali.plugin.pytorch.feed_ndarray function) and then feeding this Tensor object to our object detection model. Yolov5 does a lot of video frame pre-processing operations on the CPU, so we implemented these operations ourselves on the GPU by using a DALI video decoding pipe. This keeps the video frames & pre-processing calculations as much as possible on the GPU, which greatly sped up our pipeline. The pipeline took lots of inspiration from this answer on an issue, and I have shared my code in an earlier issue. If it helps, I can share any additional or more specific details about our implementation.

Ideally, I would hope/expect the new VideoReaderDecoder to behave similarly to fn.readers.video_resize or fn.readers.video, but simply with added VFR support, meaning it would not hang during the decoding process. So I don't think I'm looking for any specific additional features that should be added per se.

Next week, I will try to check out the experimental video decoder to see if I can make it work in my existing pipeline and do some tests on VFR videos and will report back!

Also, thanks for clarifying the VFR detection question. Your reply was certainly helpful.

Renzzauw commented 2 years ago

Hey @awolant,

I have just tried out replacing our video decoder with the experimental video decoder and did some short testing. Doing this was very easy and it seemed to run perfectly fine on VFR videos that would normally hang our pipeline. Just to be sure, I also tested it out on a CFR video and this worked perfectly fine too.

What I did notice was that the decoding speed of the experimental video decoders seems slower than using the current video decoder (judging from run-time, did not do any specific speed tests), is this something that will eventually get better? My knowledge is a bit limited about decoding, so I'm unsure to what extent FFmpeg influences these results.

Maybe a nitpicky feature request that came up to me from my experiment is, perhaps it would be nice to also have a video_resize version of this experimental video decoder as well, or is this experimental reader supposed to replace the current video decoder?

Hope this was helpful and let me know if I can help you out further!

awolant commented 2 years ago

Hey @Renzzauw

Thanks a lot for this feedback and the time you took to test the new video reader. I'm glad it worked on your VFR videos.

With regards to your questions:

We plan to make the new reader significantly faster than it is now. There is an overhead to support all VFR videos that this new approach will carry but we hope in most cases it will be similar to the old variant.
There is a plan to include resize support in the new variant. It will be implemented slightly differently than in the current VideoReaderResize. We think this will allow some performance boost in the decode+resize scenario.

Hope that was helpful to you. I'll let you know when we have some updates on the features we discussed.

In the meantime, I hope I can trouble you with another question about your use case? Do you use any kind of publicly available data? I'm asking because I'd love to benchmark and optimize the new reader against videos actually users by our users in real world scenarios. If the videos are not publicly available can you share some technical details about them? What codecs and containers do you use? What is the length of the video? How do you access them? Randomly or maybe you read the frames in order? Anything you can share with me is much appreciated and will help me immensely with the work on the new video reader. Thanks!

Renzzauw commented 2 years ago

Hey @awolant

Thanks for your reply. Regarding your answers to my questions, very excited to hear that!

Regarding our use case, we do not use publicly available data, this is video footage sent to us by our users or captured by ourselves, hence I cannot share them on this issue board directly. Can I otherwise reach out personally to you (e.g. via e-mail), so I can share more specific details about our use case and some video files, if that is okay with you?

I'd love to help you out to accelerate the support of VFR videos, since this would make our video processing pipeline more robust!

SiftingSands commented 2 years ago

Hey @Renzzauw

Thanks a lot for this feedback and the time you took to test the new video reader. I'm glad it worked on your VFR videos.

With regards to your questions:
1. We plan to make the new reader significantly faster than it is now. There is an overhead to support all VFR videos that this new approach will carry but we hope in most cases it will be similar to the old variant.

2. There is a plan to include resize support in the new variant. It will be implemented slightly differently than in the current `VideoReaderResize`. We think this will allow some performance boost in the decode+resize scenario.
Hope that was helpful to you. I'll let you know when we have some updates on the features we discussed.

In the meantime, I hope I can trouble you with another question about your use case? Do you use any kind of publicly available data? I'm asking because I'd love to benchmark and optimize the new reader against videos actually users by our users in real world scenarios. If the videos are not publicly available can you share some technical details about them? What codecs and containers do you use? What is the length of the video? How do you access them? Randomly or maybe you read the frames in order? Anything you can share with me is much appreciated and will help me immensely with the work on the new video reader. Thanks!

I've got a similar use case where I need to perform inference on a large set of VFR videos and would love to use DALI in my pipeline. Like Renzzauw, I can't share them publicly, but I can check to see if I can share them just with your team. In case I can't share them, here's some metadata from ffmpeg (it's giving an estimated frame rate, which can be deceiving).

MP4 container sample:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'REDACTED'
Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder         : Lavf58.45.100
  Duration: 00:01:13.99, start: 0.045000, bitrate: 12675 kb/s
    Stream #0:0(eng): Video: h264 (High) (avc1 / 0x31637661), yuvj420p(pc), 2048x1536 [SAR 1:1 DAR 4:3], 12670 kb/s, 29.99 fps, 1k tbr, 16k tbn, 2k tbc (default)

Average frame rate is actually 30 FPS with a stddev of 47 FPS. (calculated by decoding the video frame-by-frame and recording the timestamp of each frame)

MPEG-TS container sample:

 Input #0, mpegts from 'REDACTED'
 Duration: 00:01:29.97, start: 3.400000, bitrate: 17244 kb/s
  Program 1 
    Metadata:
      service_name    : Service01
      service_provider: FFmpeg
    Stream #0:0[0x100]: Video: h264 (High) ([27][0][0][0] / 0x001B), yuv420p(progressive), 1920x1080, 60 fps, 60 tbr, 90k tbn, 120 tbc

Average frame rate is actually 24 FPS with a stddev of 5 FPS.

JanuszL commented 2 years ago

Hi @SiftingSands,

Thank you for sharing this. We will consider that kind of video too.

Renzzauw commented 2 years ago

Hi @JanuszL @awolant,

I was wondering if there is an ETA for when we can expect this new VideoReaderDecoder or VFR video decoding support. Is this still months away or can we expect this relatively soon?

Thanks!

awolant commented 2 years ago

Hello @Renzzauw

thanks for checking in. Currently we are working on enabling the VFR support for inference with DALI TRITON backend. This will allow for fast decoding in the scenario where you sequentially read the whole file frame after frame.

These improvements will most probably apply to the VideoReaderDecoder as well, so if you do not have some complicated scheme of frame extraction you should see some performance improvements in coming weeks.

NVIDIA / DALI

Questions about VFR video decoding #4003