Ability to work along side PyAV?

vade commented 4 years ago

Describe the bug Hello - VPF appears to be functioning decently in a simple test case - however it hides / abstracts / removes (?) a lot of libAV functionality in python behind a simple interface.

Is it possible to use pyAV to vend compressed packets of HEVC or AVC data and send to VPF ? This would allow for nice flexibility of having most of libAV at your disposal (audio, re-muxing, time stamps, etc) while having access to the speed of VPF .

Am I misunderstanding the API or is this possible today with VPF?

Thank you.

rarzumanyan commented 4 years ago

Hi @vade

Hello - VPF appears to be functioning decently in a simple test case - however it hides / abstracts / removes (?) a lot of libAV functionality in python behind a simple interface.

VPF uses ffmpeg (not libAV) under the hood to obtain the Annex.B elementary bitstream from the input URL (file, network, etc.). Rest is done using Video Codec SDK and CUDA.

Technically it's easy to take compressed video frames from outside and feed them to HW. This is the function which accepts compressed video frame and kicks off decoding: https://github.com/NVIDIA/VideoProcessingFramework/blob/ae4608337b80444b50db4a3863498bcf4553d683/PyNvCodec/src/PyNvCodec.cpp#L452-L467

It doesn't really matter where the elementaryVideo comes from. I'm ok to extend the VPF interface with this as long as desing stays consistent.

The easiest approach to this IMO is to add another PyNvCodec class ctor which won't create demuxer and add couple more decoding methods which accept numpy arrays with Annex.B compressed frames.

vade commented 4 years ago

hi @rarzumanyan - thanks for the swift response.

I totally get that, ive used NVEnc / NVDec in C++ with LibAV in the past - however on the python side of things the work you've done to make a CUDA / GPU backing available to publish to other tensor tooling looks really nice, and it would be super nice to:

be able to use LibAV like interface to handle container parsing, metadata, time code,
use LibAV's codecs for audio processing, sample packet handling of other unsupported codecs outside of VPF
use VPF's NVEnc and NVDec front ends to manage GPU accelerated decode
use VPF's direct to surface, and surface to tensor support
have VPF's hardware accelerated color space / format conversions

The PyAV authors dont appear to be interested in supporting the full suite of HWAccels that LibAV integrates (a shame) - but I think having VPF python API handle something like:

Current use case - pass in a supported container/stream with a supported codec and just get frames
New : Manually create a decoder or encoder context and use that to parse manually demuxed packets from a container that PyAV/LibAV vends to a the decoder to output a surface or a frame.
New : Same for encoding - pass in a surface or some such and produce a packet that can be consumed and muxed by PyAV

This would then support surface to tensor functionality and keeping things on the GPU.

This is helpful for pro video workflows - its doable in C++ but, we have Python, and 2020 is already bad enough haha :)

Does Nvidia take 3rd party PR's? Im happy to try to wire something up but im unfamiliar with cython / python dev. In short, let me know how I can help.

I have a Google Colab notebook which demonstrates setting up a VPF GPU accelerated decoder with custom compilation and benchmarking against LibAV / pyAV - im happy to donate to the project as an example / QuickStart.

vade commented 4 years ago

One comment for those in the public - while the shared Google Colab environment can have different load so performance benchmarking is difficult, for the same h.264 QuickTime mov file -

LibAV CPU:

Decode took: 0.9556884765625 seconds
fps 92.08021458679269
3.8401634946991954 x realtime

VPF try on a T4 on collab GPU instance try one:

Decode took: 0.42092251777648926 seconds
fps 209.06460520301312
8.71894433062566 x realtime

Try 2 (less contention perhaps?)

Decode took:  0.2164461612701416 seconds
fps 406.5676170166362
16.955717664216532 x realtime

rarzumanyan commented 4 years ago

@vade

Does Nvidia take 3rd party PR's?

Yes. There are 4 contributors so far and only one of them works for Nvidia :)

New : Same for encoding - pass in a surface or some such and produce a packet that can be consumed and muxed by PyAV

This is already done. PyNvEncoder class outputs AnnexB encoded frame which you may mux using PyAV. BTW you can see this in encoding samples whici save output to Annex.B format (you can play it with VLC).

The feature you request is relatively simple, I can prepare a patch and push it so separate branch. It would be very nice if you can test it and brush up the code. VPF is more like a hobby project for me which I'm allowed to publish so often I don'e have enough time because of my main job duties.

rarzumanyan commented 4 years ago

@vade

I've pushed a commit to issue_99 branch. The code compiles but I can't test it as I don't know how to demux from Python. Please use this commit as a draft, feel free to ask if you have any questions / issues.

vade commented 4 years ago

Thank you so much, this is exciting. I sincerely appreciate the prompt reply. Ill write up a little demux harness via PyAv and compile off of your branch. Seriously appreciate it!

rarzumanyan commented 4 years ago

@vade

You're welcome. I also don't like the idea of keeping the decoder & demuxer together but that's what people prefer as the single most popular use case is to decode from RTSP camera and apply ML algos to the video frames. So I hope pyAV demuxer could save me some pain because issues often happens outside the video decoding but somewhere in the RTSP lands.

vade commented 4 years ago

Totally! Im in the same boat. Im close to having something working - check out this collab notebook

https://colab.research.google.com/drive/1LMPp2zqCjuUdmoM4XUwTHzmsEGZxl3W_?usp=sharing

Im able to vend packets from PyAV to an instantiated NVDec instance in python via the new issue_99 codebase using nvc.cudaVideoCodec_H264 initializer, but im having trouble passing in data in the right format from a libav packet into an np array.


      3 def decode_vpf(packet):
----> 4   np_encoded = np.array(packet.to_bytes(), dtype=np.uint8)
      5   np_decoded = np.ndarray(shape=(3,2), dtype=np.uint8)
      6 

ValueError: invalid literal for int() with base 10: b'\x00\x00\x00\x14\x06\x05\x10\xb9\xed\xb90]!Kq\x83q,\x10\xa3\x14\xbb)\x80\x00\x02\xc6\xd9%\xb8\x00@\x00\x06\xff\xe3\x86}:\xb3\xfb\x84\xa3\x1fcc!\xa2\xbc\x06Z-\x7f\xb3]%\xd5\xdei\xa8j\x930d`ND\xb3\xa6```

Really appreciate your prompt replies!

vade commented 4 years ago

A touch closer - one note, in order to resolve the above issues, im doing something along the lines of:

nvDec = nvc.PyNvDecoder(nvc.cudaVideoCodec_H264, 0)

#todo: make a pool / list of n writable np_array destination frames I can recycle

def decode_vpf_frame(packet):
  # not sure how I should be formatting my input to DecodeSingleFrameFromPacket ? 
  np_encoded = np.frombuffer(packet.to_bytes(),  dtype=np.uint8)
  # np_encoded.setflags(write=1)

  np_decoded = np.zeros(1920*1080*3, dtype=np.uint8)
  np_decoded.setflags(write=1)

  nvDec.DecodeSingleFrameFromPacket(np_encoded, np_decoded)
  return np_decoded

however it appears both nvDec.DecodeSingleFrameFromPacket and nvDec.DecodeSingleSurfaceFromPacket requires the init np.array to be writable.

I cant appear to set the writable flag directly on the np_encoded buffer, but I can if I copy it. However doing that seems to stall the system?

rarzumanyan commented 4 years ago

@vade DecodeSingleFrame and DecodeSingleFrameFromPacket take numpy arrays by reference and write raw frames in-place.

The reason may seem weird, but if I return numpy array from these methods with move constructor, it will make multi-threaded python scripts crush.

I’ve wasted almost a week debugging this some time ago. The root cause is unknown for me, I’ve just found that any py::array_t constructor call from within C++ VPF internals cause multi threaded pythons scripts to crash.

However, the numpy arrays with encoded frames don't need to be modified from within VPF - please add const qualifiers to the patch if you want them to be explicitly constant.

rarzumanyan commented 4 years ago

@vade I've added const qualifiers on C++ side - shall be fine from Python now. Please check the latest commit in issue_99 branch.

vade commented 4 years ago

Ah! I want sure how python arrived at the writability flags. const makes sense in retrospect. Thank you again for such availability - I really appreciate it.

your fix above resolves the writability warning/error I receive, however, im now apparently crashing google Colab's runtime kernel ha.

Ive reset the runtime to ensure the right .so is loaded, etc. Ill try a factory reset and see. This is close!

vade commented 4 years ago

So - it seems like issue_99 is causing some kernel crashes even in the degenerate case of using the VPD internal demuxer:

with issue_99 and latest fixes, this crashes my collab kernel: with master, it runs fine:

def decode_vpf(path_to_video):
  decoder = nvc.PyNvDecoder(path_to_video, 0)
  #Amount of memory in RAM we need to store decoded frame
  frameSize = decoder.Framesize()
  print(frameSize)
  rawFrameNV12 = np.ndarray(shape=(frameSize), dtype=np.uint8)

  start_time = time.time()
  framecount = 0

  while True:
      try:
          success = decoder.DecodeSingleFrame(rawFrameNV12)
          if not (success):
              print('No more video frames.')
              break

          framecount += 1

      except nvc.HwResetException:
          print('Continue after HW decoder was reset')
          continue

  end_time = time.time()

  decode_time = (end_time - start_time)
  print("Decode took: seconds", decode_time)
  print("% fps", framecount / decode_time)

rarzumanyan commented 4 years ago

@vade

I recommend you to dump encoded packets after the pyAV demuxer and PyNvCodec built-in demuxer and compare them bit to bit. If they are distinct, that's the reason.

rarzumanyan commented 4 years ago

@vade

So - it seems like issue_99 is causing some kernel crashes even in the degenerate case of using the VPD internal demuxer

That's what I was talking about - I suspect that STL containers constructor calls sometime crashes the program. I've added extra constructor to Buffer which accepts const pointers, please check out latest commit in issue_99 branch. Now no enc packet numpy array copies are made within VPF

vade commented 4 years ago

Thank you again for your prompt help with this. I sincerely appreciate it. Is there anything I can do to help debug this better in the collab notebook?

vade commented 4 years ago

Sadly, im still getting crashes / errors / stalls. Are you able on your end to run VPF issue_99 branch with no issues (outside of LibAV packet demuxing?) Can you run the simple case of :

def decode_vpf(path_to_video):
  decoder = nvc.PyNvDecoder(path_to_video, 0)
  #Amount of memory in RAM we need to store decoded frame
  frameSize = decoder.Framesize()
  # print(frameSize)
  rawFrameNV12 = np.ndarray(shape=(frameSize), dtype=np.uint8)

  start_time = time.time()
  framecount = 0

  while True:
      try:
          success = decoder.DecodeSingleFrame(rawFrameNV12)
          # surface = decoder.DecodeSingleSurface()
          if not (success):
              print('No more video frames.')
              break

          framecount += 1

      except nvc.HwResetException:
          print('Continue after HW decoder was reset')
          continue

  end_time = time.time()

  decode_time = (end_time - start_time)
  print("Decode took: seconds", decode_time)
  print("% fps", framecount / decode_time)

rarzumanyan commented 3 years ago

Hi @vade It's been a while since the last comment.

I've added a standalone demuxer class to standalone_demuxer branch which effectively implements the feature you've requested. It's still WIP yet I was able to demux elementary Annex.B stream from mov file and decode Annex.B packets given to decoder from outside like follows: https://github.com/NVIDIA/VideoProcessingFramework/blob/c5cd36de7313457ff0621a4ef10c164d8cf4e176/SampleDecode.py#L21-L37

As I started to dig into pyAV I've found that it doesn't support bitstream filters yet so it can't extract Annex.B NAL units from container. Hence it's not 100% compatible with VPF and some additional processing is needed. I've tried this code snippet from pyAV doc:

import av
import av.datasets

input_ = av.open(av.datasets.curated('pexels/time-lapse-video-of-night-sky-857195.mp4'))
output = av.open('remuxed.mkv', 'w')

# Make an output stream using the input as a template. This copies the stream
# setup from one to the other.
in_stream = input_.streams.video[0]
out_stream = output.add_stream(template=in_stream)

for packet in input_.demux(in_stream):

    print(packet)

    # We need to skip the "flushing" packets that `demux` generates.
    if packet.dts is None:
        continue

    # We need to assign the packet to the new stream.
    packet.stream = out_stream

    output.mux(packet)

input_.close()
output.close()

With small modifications and was basically saving raw packet data to Annex.B file instead of remuxing and it turned out that packets doesn't contain Annex.B NAL units but something else instead. So if you was using the same approach in your tests that's no surprise that VPF was crushing.

maxclaey commented 3 years ago

Hi all! Thanks for your awesome work! What's the current status of this? I would also be very interested in using PyAV for demuxing and using VPF for accelerated decoding!

rarzumanyan commented 3 years ago

Hi @maxclaey

Last time I tried PyAV there was no API for bitstream filters so it was impossible (at least I didn't find a way) to extract Annex.B elementary bitstream from container. This feature is essential as NV hardware can only accept Annex.B compressed video stream.

maxclaey commented 3 years ago

Hi @rarzumanyan Thanks for your swift response! I had a quick look at the PyAV codebase and saw this issue https://github.com/PyAV-Org/PyAV/issues/489 and related pull request https://github.com/PyAV-Org/PyAV/commit/44195b62092fcfcf684a07c802cee3d1b8b80b60 Looks like this might be what's needed? I'll use this as a starting point for some experimentation

rarzumanyan commented 3 years ago

Hi @maxclaey

Looks like this might be what's needed?

I hope so. Standalone ffmpeg muxer is available in VPF as well as standalone decoder so I guess good place to start would be to demux some video container to Annex.B stream with VPF demuxer and PyAV demuxer and compare Annex.B videos then.

maxclaey commented 3 years ago

Hi @rarzumanyan! I will definitely give it a shot. I hope to find some time for it somewhere next week. Should I work on the standalone_demuxer branch, of is this available in master as well?

rarzumanyan commented 3 years ago

@maxclaey This feature was merged to master branch some time ago.

maxclaey commented 3 years ago

@rarzumanyan I had a shot at demuxing with PyAV + bitstream filtering, and comparing that with what I get when demuxing with NVC.

For PyAV I used the following snippet, based on your snippet above and the example provided in the PyAV bistream filtering branch:

import av
import av.datasets
from av.bitstream import bitstream_filters_available, BitStreamFilter, BitStreamFilterContext

def decoding(input_file, output_file):
    input_ = av.open(av.datasets.curated(input_file))
    decFile = open(output_file, "wb")

    in_stream = input_.streams.video[0]

    ctx = BitStreamFilterContext('h264_mp4toannexb')

    for packet in input_.demux(in_stream):

        # We need to skip the "flushing" packets that `demux` generates.
        if packet.dts is None:
            continue

        for outpacket in ctx(packet):
            decFile.write(outpacket)

    input_.close()
    decFile.close()

For NVC I used the following snippet, also based on your example above:

import PyNvCodec as nvc
import numpy as np

def decode(gpuID, encFilePath, decFilePath): 
    decFile = open(decFilePath, "wb") 

    nvDmx = nvc.PyFFmpegDemuxer(encFilePath) 

    packet = np.ndarray(shape=(0), dtype=np.uint8) 
    frameSize = int(nvDmx.Width() * nvDmx.Height() * 3 / 2) 
    rawFrame = np.ndarray(shape=(frameSize), dtype=np.uint8) 

    while True: 
        if not nvDmx.DemuxSinglePacket(packet): 
            break 
        decFile.write(bytearray(packet))

    decFile.close()

When I check the resulting files, I see that the file from NVC is 120 larger than the one for PyAV (2073877 bytes vs 2073757). The content looks very similar though: First 50 bytes of PyAV result "????E???H??,? ?#??x264 - core 148 r10 72d5 First 50 bytes of NVC result ????E???H??,? ?#??x264 - core 148 r10 72d5 Last 50 bytes of PyAV result 7?I???,?zh3?a\???wÏ^?h}?Q??2?;??4?|????8? Last 50 bytes of NVC result 7?I???,?zh3?a\???wÏ^?h}?Q??2?;??4?|????8?

When I try to decode packets demuxed with PyAV using the PyNvDecoder, I just see some binary blobs being printed in stdout, nothing more.

I'm probably missing something here due to my lack of knowledge on the internals of video coding.

Could you give me some pointers on how to advance with this? Many thanks in advance!

rarzumanyan commented 3 years ago

Hi @maxclaey

You can dump demuxed Annex.B packets into binary file and play it with VLC / ffmpeg / JM / HM. If they can decode it means PyAV produces healthy Annex.B stream and issue is somewhere within VPF.

BTW could you please upload the files somewhere, I'll take a closer look at them?

maxclaey commented 3 years ago

Unfortunately I cannot open the binary file created with PyAV bit stream filter

maxclaey commented 3 years ago

Hi @rarzumanyan

I'm one step closer I guess. With the following snippet, I create 2 outputs. One in the same way as before, another one where I use a PyAV output stream:

import av
import av.datasets
from av.bitstream import bitstream_filters_available, BitStreamFilter, BitStreamFilterContext

def decoding(input_file, output_file):
    input_ = av.open(av.datasets.curated(input_file))
    decFile = open(output_file, 'wb')

    in_stream = input_.streams.video[0]

    # PYAV OUTPUT
    output = av.open(f'{output_file}.h264', 'wb')
    out_stream = output.add_stream(template=in_stream)

    ctx = BitStreamFilterContext('h264_mp4toannexb')

    for packet in input_.demux(in_stream):
        # We need to skip the "flushing" packets that `demux` generates.
        if packet.dts is None:
            continue

        for outpacket in ctx(packet):
            outpacket.stream = out_stream  # <- Set output stream
            decFile.write(outpacket)
            output.mux(outpacket)  # <- PyAV mux

    input_.close()
    output.close()
    decFile.close()

The first output (just writing bytes to a file) is the same as before, with 120 bytes short compared to NVC, not playable. The second approach, that uses PyAV, gives the exact same binary output as NVC, with the same size and md5sum. So it seems like output.mux adds some relevant information

rarzumanyan commented 3 years ago

@maxclaey

Great news! Can you decode the muxed output with VPF?

maxclaey commented 3 years ago

@rarzumanyan

Unfortunately not, that's a bit the problem. The packets themselves, even with the stream information on it are not useful. In the above script, when I just write the outpacket (the packet that we could try to decode with VPF) to a binary file decFile, the file cannot be played (its 120 bytes smaller than what we get with NVC). Only when we write it to file using output.mux(outpacket), with PyAV involved, the file is playable

rarzumanyan commented 3 years ago

Do I need to build PyAV from source to use the bitstream filters API or it should work out of the box? I'll try to reproduce on my machine.

maxclaey commented 3 years ago

The bitstream filters need a custom build with bitstream filter support. If you want I can send you a manylinux wheel for the python version that you want, that might be easier?

rarzumanyan commented 3 years ago

I'm developing under Windows so unfortunately I have to build PyAV from source then. Could you upload the bistream which is demuxed by PyAV without output.mux(outpacket) being applied? I'll check it with the bitstream analyzer to see what it lacks.

maxclaey commented 3 years ago

In the attached zip there are 2 files:

annexb: these are the plain packet bytes written to file
annexb.h264: these are the packets written to file using output.mux(outpacket) annexb.zip

rarzumanyan commented 3 years ago

@maxclaey

Looks like NAL units delimiter (two zero bytes 0x00 0x00) are missing in annexb file. Could you append them to every packet after you demux it?

maxclaey commented 3 years ago

@rarzumanyan

Thanks for the suggestion. I gave it a try, but unfortunately it didn't help: the resulting file is still not playable. Furthermore, the resulting file is now larger than what I get with NVC. FWIW: there are 168 demuxed packets in the considered sequence, and the file I get with PyAV (without adding NUL unit delimiters) is 120 bytes smaller than the one with NVC. When adding the delimiters to each packet, the resulting file is now 216 bytes larger than the one with NVC

maxclaey commented 3 years ago

For each of the 168 packets, I've printed the first and last 2 bytes, together with the packet size, and compared it between the PyAV and NVC results. It can be seen that packets 0, 75 and 150 are 40 bytes larger with NVC compared to PyAV

1c1
< 0: First bytearray(b'\x00\x00') - last bytearray(b'\x1c\x9f') - size 102921
---
> 0: First bytearray(b'\x00\x00') - last bytearray(b'\x1c\x9f') - size 102881
76c76
< 75: First bytearray(b'\x00\x00') - last bytearray(b'\x1b\xe0') - size 156569
---
> 75: First bytearray(b'\x00\x02') - last bytearray(b'\x1b\xe0') - size 156529
151c151
< 150: First bytearray(b'\x00\x00') - last bytearray(b'\x1ex') - size 70845
---
> 150: First bytearray(b'\x00\x01') - last bytearray(b'\x1ex') - size 70805

maxclaey commented 3 years ago

Hi @rarzumanyan

Do you have any advice on how I can further debug (or solve) the remaining difference in some of the packages?

rarzumanyan commented 3 years ago

Hi @maxclaey

It looks like PyAV demuxes Annex.B right although it does something weird with the content (either it removes the NAL separators or something like this). I'd rather debug PyAV on the C/C++ side and find the point where PyAV packet and VPF packet diverge in their content.

Unfortunately I can't take this up right now but this is the thing I'm going to do as soon as I have a bit of time. Beside inspecting the PyAV C guts I'm afraid there's no advice I can give you.

rarzumanyan commented 3 years ago

Hi @maxclaey, Could you please give me a hand with PyAV installation?

I was able to build it from source, the .so files are located within /home/roman/Install/PyAV/build/lib.linux-x86_64-3.8/av folder. How can I import all those .so modules under single av module name? E. g. if I run this snippet:

import os
os.environ['LD_LIBRARY_PATH'] = os.getcwd()

import av
import av.datasets
from av.bitstream import bitstream_filters_available, BitStreamFilter, BitStreamFilterContext

def decoding(input_file, output_file):
    input_ = av.open(av.datasets.curated(input_file))
    decFile = open(output_file, "wb")

    in_stream = input_.streams.video[0]

    ctx = BitStreamFilterContext('h264_mp4toannexb')

    for packet in input_.demux(in_stream):

        # We need to skip the "flushing" packets that `demux` generates.
        if packet.dts is None:
            continue

        for outpacket in ctx(packet):
            decFile.write(outpacket)

    input_.close()
    decFile.close()

decoding("/home/roman/Videos/bbb_sunflower_1080p_30fps_normal.mp4", "/home/roman/Videos/bbb_sunflower_1080p_30fps_normal.h264")

I obviously get import error as Python interpreter can't find av module:

Exception has occurred: ModuleNotFoundError
No module named 'av'

maxclaey commented 3 years ago

Hi @rarzumanyan If you are working on Linux I can give you some wheels, that might be easier. I have never tried building it from source and using it that way, but this is the only thing I find in the docs: https://github.com/PyAV-Org/PyAV#alternative-installation-methods. Also please note that the bitstreamfilter is not available in PyAV by default, you need some changes for that (as referenced in one of the merge requests above). I have a branch on my local fork where those are in: https://github.com/maxclaey/PyAV/tree/v8.0.2-rtcp-timestamp-filter. If you want I can give you a prebuild wheel for it

rarzumanyan commented 3 years ago

Hi @maxclaey

I was able to deceive PyAV output container with underlying byte IO buffer instead of file output like so:

import av
import io
from av.bitstream import BitStreamFilter, BitStreamFilterContext

def demux_h264(input_file, output_file):
    dec_file = open(output_file, "wb")
    input_container = av.open(input_file)
    in_stream = input_container.streams.video[0]
    bsfc = BitStreamFilterContext('h264_mp4toannexb')

    #Create raw byte IO instead of file IO
    #Fake the extension to satisfy FFmpeg muxer
    byte_io = io.BytesIO()
    byte_io.name = 'muxed.h264'

    #Make FFmpeg to output Annex.B H.264 packets to raw bytes IO
    out_container = av.open(byte_io, 'wb')
    out_stream = out_container.add_stream(template=in_stream)

    for packet in input_container.demux(in_stream):
        if packet.dts is None:
            continue

        for out_packet in bsfc(packet):
            #Mux packet to Annex.B H.264 format
            out_container.mux_one(out_packet)
            #Now deal with IO stuff under the hood of output container
            byte_io.flush()
            #byte_io.getvalue() shall be proper Annex.B elementary bitstream portion.
            dec_file.write(byte_io.getvalue())
            byte_io.seek(0)
            byte_io.truncate()

    input_container.close()
    dec_file.close()
    out_container.close()

    return 0

def main():
    demux_h264("/home/roman/Videos/bbb_sunflower_1080p_30fps_normal.mp4", "/home/roman/Videos/bbb_sunflower_1080p_30fps_normal.h264")
    return 0

if __name__ == "__main__":
    main()

Unfortunately the Linux machine on which I've built PyAV can't run VPF so please check out this snippet on your PC. H.264 file that's saved on disk is playable with VLC at least. It works fast, so shouldn't be the show stopper for HW-accelerated decoding.

rarzumanyan commented 3 years ago

@maxclaey

After some fiddling, I've tested this approach and it works. Below is the snippet. Please test on your machine. If it works, we shall speak to PyAV people as we need this to be merged to main branch.

import av
import io
from av.bitstream import BitStreamFilter, BitStreamFilterContext

import PyNvCodec as nvc
import numpy as np
import sys

def decode_h264(input_file, output_file):
    dec_file = open(output_file, "wb")
    input_container = av.open(input_file)
    in_stream = input_container.streams.video[0]
    bsfc = BitStreamFilterContext('h264_mp4toannexb')

    width, height = in_stream.codec_context.width, in_stream.codec_context.height
    nvDec = nvc.PyNvDecoder(width, height, nvc.PixelFormat.NV12, nvc.CudaVideoCodec.H264, 0)
    frameSize = int(width * height * 3 / 2)
    rawFrame = np.ndarray(shape=(frameSize), dtype=np.uint8)

    #Create raw byte IO instead of file IO
    #Fake the extension to satisfy FFmpeg muxer
    byte_io = io.BytesIO()
    byte_io.name = 'muxed.h264'

    #Make FFmpeg to output Annex.B H.264 packets to raw bytes IO
    out_container = av.open(byte_io, 'wb')
    out_stream = out_container.add_stream(template=in_stream)

    for packet in input_container.demux(in_stream):
        if packet.dts is None:
            continue

        for out_packet in bsfc(packet):
            out_container.mux_one(out_packet)
            byte_io.flush()

            enc_packet = np.frombuffer(buffer=byte_io.getvalue(), dtype=np.uint8)
            if nvDec.DecodeFrameFromPacket(rawFrame, enc_packet):
                bits = bytearray(rawFrame)
                dec_file.write(bits)

            #Truncate byte IO so that it stores just single packet
            byte_io.seek(0)
            byte_io.truncate()

    input_container.close()
    dec_file.close()
    out_container.close()

    return 0

def main():
    decode_h264("/home/roman/Videos/bbb_sunflower_1080p_30fps_normal.mp4", "/home/roman/Videos/bbb_sunflower_1080p_30fps_normal.nv12")
    return 0

if __name__ == "__main__":
    main()

maxclaey commented 3 years ago

Hi @rarzumanyan Sorry for my late response, was pretty busy the last couple of days. I just checked the above script and it indeed works as expected!

rarzumanyan commented 3 years ago

@maxclaey

Good news. I've pushed a brushed-up version of the snippet above into pyav_support branch. There's no answer from PyAV developers so far regarding the feature branch merge into master.

niaoyu commented 3 years ago

@rarzumanyan Hi, I just followed your work and success demux using pyav and decoding using vpf by local file. With rtsp url as input, the output shows here:

fps:25 width:1920 height:1080 Decoding on GPU 0 Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead. Application provided invalid, non monotonically increasing dts to muxer in stream 0: 190980 >= 3600 Application provided invalid, non monotonically increasing dts to muxer in stream 0: 190980 >= 6480 Application provided invalid, non monotonically increasing dts to muxer in stream 0: 190980 >= 9810 Application provided invalid, non monotonically increasing dts to muxer in stream 0: 190980 >= 13050 Application provided invalid, non monotonically increasing dts to muxer in stream 0: 190980 >= 16650 Application provided invalid, non monotonically increasing dts to muxer in stream 0: 190980 >= 20250

I think the warning here will be the outputs about the bad network transmission. Am i right?

rarzumanyan commented 3 years ago

Hi @niaoyu

This issue covers the exact reason I'm willing to support PyAV - networking problems IMO are far outside the VPF scope. My advice is to use PyAV to demux your RTSP video to local Annex.B file and decode it with any player to see if the video is valid.

niaoyu commented 3 years ago

@rarzumanyan Great!

The warning here exist in only first dozens of lines for my rtsp. And in the following code, I convert the nv12 frame to rgb, save in the local file and check that is correct at last.

Besides, I write a dockerfile for the whole processs and glad to write a PR for that if you permit.

rarzumanyan commented 3 years ago

Hi @niaoyu

Besides, I write a dockerfile for the whole processs and glad to write a PR for that if you permit.

PRs are absolutely welcome, you don't need any specific permit to submit one.

philipp-schmidt commented 3 years ago

Is there a specific PR in PyAv that we could upvote? It seems that at least a handful of people have a good use-case for bitstream support now in PyAV. How can I help? @rarzumanyan @maxclaey

@niaoyu Could you extend this Dockerfile with your PyAV? So we have everything in one place. https://github.com/NVIDIA/VideoProcessingFramework/issues/130#issuecomment-731886291

NVIDIA / VideoProcessingFramework

Ability to work along side PyAV? #99