Ability to work along side PyAV?

vade commented 4 years ago

Describe the bug Hello - VPF appears to be functioning decently in a simple test case - however it hides / abstracts / removes (?) a lot of libAV functionality in python behind a simple interface.

Is it possible to use pyAV to vend compressed packets of HEVC or AVC data and send to VPF ? This would allow for nice flexibility of having most of libAV at your disposal (audio, re-muxing, time stamps, etc) while having access to the speed of VPF .

Am I misunderstanding the API or is this possible today with VPF?

Thank you.

rarzumanyan commented 3 years ago

@philipp-schmidt

It's https://github.com/PyAV-Org/PyAV/commit/44195b62092fcfcf684a07c802cee3d1b8b80b60 Although I've tried to connect with involved PyAV developers via GitHub & LinkedIn without any success.

philipp-schmidt commented 3 years ago

@rarzumanyan Could you explain what's going on in your SamplePyav.py script? E.g. whats nvDec.FlushSingleFrame good for?

To my understanding: PyAV takes an input resource (local file, rtsp, etc.) and returns individual packets, which are passed into a python bytebuffer and then read by VPF for H.264 decoding like usual.

Is there a time critical part in this code? Or asked more specifically: What would a wrapper for simple decoding look like that takes an input URL (RTSP explicitly included) and offers a simple numpy "getNextFrame()" interface? How many packets would this wrapper have to read or how would the IO loop in this wrapper look like? To what extent would PyAV handle buffers and everything? Where do PySurfaceConverter and PySurfaceDownloader go?

I can read the code just fine, but the lack of knowledge regarding video transport, containers and codecs really makes this hard to grasp.

rarzumanyan commented 3 years ago

@philipp-schmidt,

Could you explain what's going on in your SamplePyav.py script? PyAV takes an input resource (local file, rtsp, etc.) and returns individual packets, which are passed into a python bytebuffer and then read by VPF for H.264 decoding like usual.

This is pretty accurate representation of what's happening inside the script. PyAV is used to obtain Annex.B NAL units in form of PyAV packets from the input URL.

The only caveat is PyNvDecoder being async by it's design. For better performance, you don't obtain decoded Surface every time you submit a packet. Instead of that you submit packets into sort of input queue and receive decoded Surfaces as they are ready from another sort of output queue. This way HW is kept busy. BTW this is the reason you can max out Nvenc and Nvdec from Python.

When decoder is combined with demuxer into a single C++ class wrapped into Python class, there's no need to take care of PyNvDecoder async design. Every time you call DecodeSingleSurface or DecodeSingleFrame the guts of C++ class keep feeding GPU with packets from demuxer until the decoded Surface is ready.

As soon as you put demuxer into separate class, you start taking care of PyNvDecoder async design. You basically feed it with packets and every time you do it you also check if there's an output frame. To take into account the gap between first packet submission and first decoded Surface acquisition, you feed the decoder with empty packets when you're actual packets are over to flush these internal input/output queues.

What would a wrapper for simple decoding look like that takes an input URL (RTSP explicitly included) and offers a simple numpy "getNextFrame()" interface?

It's kinda-implemented in https://github.com/NVIDIA/VideoProcessingFramework/blob/b10a715a6813f17141202122a9f8cb1cbeb65a9b/SamplePyav.py#L80

Simplest solution would be to implement Python class with both PyNvDecoder and PyAV demuxer inside. Constructor may take input URL as single argument. Then per every getNextFrame() call you feed decoder with packets until it gives you back a single decoded Surface.

Where do PySurfaceConverter and PySurfaceDownloader go?

PySurfaceDownloader is built into PyNvDecoder on C++ level. DecodeSingleFrame is a simple combination of DecodeSingleSurface and DownloadSingleSurface. PySurfaceConverter isn't built-in into decoder, so if you'd like to convert your Surface you have to do it explicitly from Python.

To what extent would PyAV handle buffers and everything?

This is a good question to which I don't know the answer. This bug is my first actual PyAV experience. I hope that more savvy Python people may help community with non-trivial questions like this one. I'm 100% positive towards PR and other kinds of contribution from community members so if you have some knowledge you'd like to add to this project - please let me know, I'll be happy to do this.

philipp-schmidt commented 3 years ago

Thanks for this very comprehensive explanation! I do understand what's going on now. Especially the async property of the nvenc really makes sense now.

I will extend the aforementioned Dockerfile with build instructions for PyAV with bitstream enabled (until we get support on the master branch) and will probably write a small wrapper for the very specific purpose of decoding an RTSP stream and returning numpy images. I'm curious how stable this setup will be in a few real world tests with IP cameras. Happy to PR the Dockerfile, demo and findings back of course!

philipp-schmidt commented 3 years ago

Hi, did not have time till now, but here is a Dockerfile to test the above script. It builds PyAV with bitstream support and then installs VPF.

FROM nvcr.io/nvidia/tensorrt:20.09-py3
ENV DEBIAN_FRONTEND=noninteractive

WORKDIR /app

RUN apt update
RUN apt install -y git cmake wget unzip ffmpeg python3 virtualenv build-essential pkg-config python3-dev python3-pip \
    libavformat-dev libavcodec-dev libavdevice-dev libavutil-dev libswscale-dev libswresample-dev libavfilter-dev

# Install PyAV with bitstream support
RUN git clone https://github.com/PyAV-Org/PyAV.git
RUN cd PyAV && git checkout 44195b6
RUN /bin/bash -c "cd PyAV && source scripts/activate.sh && pip install --upgrade -r tests/requirements.txt && make"
RUN cd PyAV && pip3 install .

# Install VPF
RUN git clone https://github.com/NVIDIA/VideoProcessingFramework.git vpf
ADD Video_Codec_SDK_11.0.10.zip ./vpf
ENV CUDACXX /usr/local/cuda/bin/nvcc
RUN cd vpf && unzip Video_Codec_SDK_11.0.10.zip && \
    mkdir -p build && cd build && \
    cmake .. \
        -DFFMPEG_DIR:PATH="/usr/bin/ffmpeg" \
        -DVIDEO_CODEC_SDK_DIR:PATH="/app/vpf/Video_Codec_SDK_11.0.10" \
        -DGENERATE_PYTHON_BINDINGS:BOOL="1" \
        -DGENERATE_PYTORCH_EXTENSION:BOOL="0" \
        -DPYTHON_LIBRARY=/usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.so \
        -DPYTHON_EXECUTABLE="/usr/bin/python3.6" .. \
        -DCMAKE_INSTALL_PREFIX:PATH="/app" && \
    make -j$(nproc) && make install && \
    cd /app && \
    rm -rf vpf && \
    mv bin/*.so . && rm -rf bin
ENV LD_LIBRARY_PATH=/app:${LD_LIBRARY_PATH}

# Add decode script
ADD decode.py .

rarzumanyan commented 3 years ago

Hi @philipp-schmidt

Thanks for the update! I've added this to project's wiki page. If you want to add this in form of Dockerfile, please make a pull request. People often use VPF inside Docker so this may be helpful for them as well.

philipp-schmidt commented 3 years ago

I will PR a Dockerfile once I can optimize it a little (TensorRT as base image is overkill but cuda-ubuntu base image didn't want pybind to work at all during cmake config for some reason, so I gave up to test the rest)

Question might be preliminary as I'm not done checking code, but maybe as you are here right now: I'm trying to convert to BGR (from NV12 I suppose?) via PyNVConverter. I'm unsure where in host and device memory the data is being put with this setup and how many memory copies this introduces (obviously trying to minimize this). Also unsure whether I have to use PyNVSurfaceDownloader after that? Is there an easier way to decode to BGR right away? Changing NV12 to BGR does segfault. Is NV12 a default? Also: Using python/CPU to color convert is probably way too slow right?

rarzumanyan commented 3 years ago

@philipp-schmidt

Native output format for Nvdec is NV12. NV12 -> BGR conversion is supported, you can use it like follows:

nvDec = nvc.PyNvDecoder(encFile, gpuID)
nvCvt = nvc.PySurfaceConverter(width, height, self.nvDec.Format(), nvc.PixelFormat.BGR, gpuID)
nvDwn = nvc.PySurfaceDownloader(width, height, self.nvCvt.Format(), gpuID)
...
rawSurface = nvDec.DecodeSingleSurface()
cvtSurface = nvCvt.Execute(rawSurface)

# Will be 1D array, reshape it later.
rawFrame = np.ndarray(shape=(cvtSurface.HostSize()), dtype=np.uint8)

nvDwn.DownloadSingleSurface(resSurface, rawFrame)

More of this is shown in SampleDecodeMultiThread.py

Color conversion is CUDA-accelerated so it happens in vRAM. You can download it from vRAM to numpy ndarray once it's done.

philipp-schmidt commented 3 years ago

Perfect thanks. Yes that's exactly what I was using so far. However:

nvDec.DecodeFrameFromPacket(rawFrame, enc_packet)

will probably copy rawFrame to CPU right away?

rarzumanyan commented 3 years ago

will probably copy rawFrame to CPU right away?

Yes. It's a VPF naming convention. Frames are images in RAM, Surfaces are images in vRAM. Nvdec always outputs to vRAM so all functions that deal with Frames upload them to GPU (or download from GPU) under the hood.

philipp-schmidt commented 3 years ago

I just discovered DecodeSurfaceFromPacket. Thanks for the help. "as I'm not done checking code" hah.

philipp-schmidt commented 3 years ago

Hi @rarzumanyan I got it all working and pushed my results into a new repo for a short demo: https://github.com/isarsoft/nvenc-h264-video-reader

Unfortunately the issues with the RTSP streams of e.g. AXIS cameras still remains! I'm am confident at this point that there might be an issue in the decoder after all, as I can trace and inspect every single package via PyAV up to the DecodeSurfaceFromPacket step, but the Decoder never returns a Surface nor can I see any decoding being done via nvidia-smi dmon. Would you mind looking into it again? I'm happy to share all info that I have including a public RTSP stream for testing.

philipp-schmidt commented 3 years ago

Some preliminary info. On a healthy RTSP stream I get the following:

Decoding on GPU 0
IN_PACKET
OUT_PACKET
Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
RECEIVED_SURFACE
frame read
IN_PACKET
OUT_PACKET
RECEIVED_SURFACE
frame read
IN_PACKET
OUT_PACKET
RECEIVED_SURFACE
(...)

On some other RTSP streams (specifically those from various AXIS cameras that I can test):

Decoding on GPU 0
IN_PACKET
OUT_PACKET
Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.
IN_PACKET
OUT_PACKET
Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20808 >= 3003
IN_PACKET
OUT_PACKET
Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20808 >= 6006
IN_PACKET
OUT_PACKET
Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20808 >= 9010
IN_PACKET
OUT_PACKET
Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20808 >= 12013
IN_PACKET
OUT_PACKET
Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20808 >= 15016
IN_PACKET
OUT_PACKET
Application provided invalid, non monotonically increasing dts to muxer in stream 0: 20808 >= 18019
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
IN_PACKET
OUT_PACKET
(...)

At this point the stream is stuck endlessly in this loop without ever returning a Surface. I can see the package data and inspect it just fine. For more info at which point which message is being print have a look into the GPUVideoReader.py that provide in the repo.

rarzumanyan commented 3 years ago

Hi @philipp-schmidt

I'm am confident at this point that there might be an issue in the decoder after all, as I can trace and inspect every single package via PyAV up to the DecodeSurfaceFromPacket step, but the Decoder never returns a Surface nor can I see any decoding being done via nvidia-smi dmon. Would you mind looking into it again? I'm happy to share all info that I have including a public RTSP stream for testing.

That's very easy to check. Just dump NAL units to disk instead of feeding them to Nvdec. You will end up with Annex.B NAL Units elementary bitstream. This can be decoded with VLC, HM (for H.265) and JM (for H.264) reference decoders. You can also upload the file online and I will check it on my PC.

If the file you obtain is valid H.264 / H.265 elementary bitstream, then problem clearly lies on VPF side. Otherwise it requires more investigation.

philipp-schmidt commented 3 years ago

You are correct again, the file seems to be broken even to playback with ffplay.

[h264 @ 0x7f9fc811b720] non-existing PPS 0 referenced
[h264 @ 0x7f9fc811b720] decode_slice_header error
[h264 @ 0x7f9fc811b720] no frame!
[NULL @ 0x7f9fc80022c0] non-existing PPS 0 referenced
[h264 @ 0x7f9fc8197480] non-existing PPS 0 referenced
[h264 @ 0x7f9fc8197480] decode_slice_header error
[h264 @ 0x7f9fc8197480] no frame!
[NULL @ 0x7f9fc80022c0] non-existing PPS 0 referenced
[h264 @ 0x7f9fc81b37a0] non-existing PPS 0 referenced
[h264 @ 0x7f9fc81b37a0] decode_slice_header error
[h264 @ 0x7f9fc81b37a0] no frame!
[h264 @ 0x7f9fc81cfac0] non-existing PPS 0 referenced
[h264 @ 0x7f9fc81cfac0] decode_slice_header error
[h264 @ 0x7f9fc81cfac0] no frame!
[h264 @ 0x7f9fc836ece0] non-existing PPS 0 referenced
[h264 @ 0x7f9fc836ece0] decode_slice_header error
[h264 @ 0x7f9fc836ece0] no frame!
[h264 @ 0x7f9fc838ac60] non-existing PPS 0 referenced
[h264 @ 0x7f9fc838ac60] decode_slice_header error
[h264 @ 0x7f9fc838ac60] no frame!
[h264 @ 0x7f9fc83a6aa0] non-existing PPS 0 referenced
(and so on.....)

Same playback works fine for the already known healthy RTSP streams.... Searching for this error results in "no keyframe received yet" according to this stackoverflow thread. ffplay even reports [h264 @ 0x7fa248000b80] Format h264 detected only with low score of 1, misdetection possible!

For the sake of this issue: PyAV+VPF seems to work just fine in tandem and the issues seem to not be with VPF as expected.

philipp-schmidt commented 3 years ago

I'm absolutely confused at this point.

for frame in container.decode(video=0):
    img = frame.to_nd_array(format='bgr24')
    cv2.imshow("FRAME", img)
    cv2.waitKey(1)

This script plays the stream just fine using the exact same PyAV setup directly. Could there be a problem with bitstream extraction? Stream seems to fail whenever bitstream extraction is at work, either in VPF directly or now in PyAV? @rarzumanyan

philipp-schmidt commented 3 years ago

NAL Unit File for reference: https://drive.google.com/drive/folders/1DREpmIiO1VvNGlxbsAhQ8u_J2915Ve6V?usp=sharing

rarzumanyan commented 3 years ago

@philipp-schmidt

I think issue lies in the way the PyAV is used.

In the end of the day it's the same ffmpeg library being wrapped with Python bindings, it just needed to be initialized & used properly. I would say 99% of VPF problems are RTSP cameras problems - people can't read the video stream properly from them with ffmpeg C API (or Python bindings) while they do read packets and decode them with ffmpeg app.

Undoubtedly there's a way to solve this task with ffmpeg C API (or Python bindings) - one just need to spend fair amount of time investigating what ffmpeg application (or PyAV ffmpeg decoder) does.

Taking your code snippet as example:

for frame in container.decode(video=0):
    img = frame.to_nd_array(format='bgr24')
    cv2.imshow("FRAME", img)
    cv2.waitKey(1)

As long as you don't extract data outside PyAV everything is fine. Although as soon as you do this you need to make sure the data is a set of valid Annex.B. NAL units, take care of decoder (re)initialization in case of broken data being fed to Nvdec and such. Bitter truth is that VPF is extremely thin interface between your data and HW and it's hell of a job to prepare the video packets in certain fashion before you kick them off to GPU.

BTW, take a look at https://github.com/NVIDIA/VideoProcessingFramework/issues/99#issuecomment-733777616 to see how to dump elementary Annex.B bitstream with PyAV.

philipp-schmidt commented 3 years ago

I agree, it doesn't seem to be trivial to do the NAL extraction in a robust way. I guess this needs a deep dive into ffmpeg / libav. I believe that checking NAL start bits might be a good start to check validity of the Annex.B. NAL unit streams of the problematic cameras. Can you recommend any other tools to analyse the data in detail? As PyAV decodes the stream correctly I'm pretty sure this must be a matter of only a few malformed bits due to manufacturers specific implementation or the bitstream extraction implementation in ffmpeg. Thank you for the help up to this point. I think this now leaves the topic of this issue (or even this repo). I recommend we open the new github discussion feature and continue there?

philipp-schmidt commented 3 years ago

I'm currently also looking into using gstreamer instead of libav for the same purpose. As Deepstream is using gestreamer and nvenc and is (hopefully?!) working with AXIS cameras, maybe the implementation there is more robust.

https://gstreamer.freedesktop.org/documentation/gst-plugins-bad-codecparsers/gsth264parser.html?gi-language=c

It also supports AVC and I'm not sure that's the case with libav (at least implicitly)

philipp-schmidt commented 3 years ago

The following command will do the trick with a simple gst-pipeline:

gst-launch-1.0 rtspsrc location="rtsp://media.smart-streaming.com/mytest/mp4:sample.mp4" ! queue ! "application/x-rtp,media=video" ! rtph264depay ! h264parse ! video/x-h264, stream-format="byte-stream" ! filesink location="test.h264"

From this discussion: http://gstreamer-devel.966125.n4.nabble.com/Saving-raw-h264-packets-td4682804.html

@rarzumanyan I know this would be a big change, but what are the chances to replace libav with gstreamer in the core of VPF? GStreamer seems to be way more mature in this regard. And NVIDIA is already trusting GStreamer in the core of Deepstream. VPF would therefore offer the same features as Deepstream - but right at your fingertips with python and in a single lib.

philipp-schmidt commented 3 years ago

I can confirm that the above command produces a playable H.264 file for all my RTSP streams that I can test. I attached the file in GDrive for reference (gstreamer.h264):

https://drive.google.com/drive/folders/1DREpmIiO1VvNGlxbsAhQ8u_J2915Ve6V?usp=sharing

rarzumanyan commented 3 years ago

but what are the chances to replace libav with gstreamer in the core of VPF?

It's not a big deal at all. VPF accepts Annex.B packets, it doesn't really matter how do you obtain them. You can use built-in demuxer based on ffmpeg C API, external PyAV demuxer, etc. - anything which gives you Annex.B NAL units. Can even read them from elementary bitstream file and search for NAL Unit start code every time.

FFmpeg is used for demuxing only and I'd be honestly glad to throw it away at first chance.

Both Gstreamer and FFmpeg are "kind of same" from Video Codec SDK (and hence VPF) point of view - they provide our HW with Annex.B data. Then NV HW decode or encode keeping the frames in GPU memory and does all the CUDA & AI goodness.

So if you can do the job with gstreamer and/or python bindings to it - I'll be super happy to spread this info across VPF users.

And NVIDIA is already trusting GStreamer in the core of Deepstream.

Yes, VPF and DeepStream have intersecting functionality. They just aim different users. DeepStream is big industrial-grade out-of-the-box product while VPF is small & more agile (and has different licensing).

rarzumanyan commented 3 years ago

@philipp-schmidt

After some fiddling I was able to use ffmpeg-python module to demux video file and put output to pipe in background process. Content of this pipe can be fed to PyNvDecoder like so:

import ffmpeg
import subprocess

import numpy as np
import PyNvCodec as nvc

args = (ffmpeg
    .input('big_buck_bunny_1080p_h264.mov')
    .output('pipe:', vcodec='copy', **{'bsf:v': 'h264_mp4toannexb'}, format='h264')
    .compile())

proc = subprocess.Popen(args, stdout=subprocess.PIPE)

nvdec = nvc.PyNvDecoder(int(1920), int(1080), nvc.PixelFormat.NV12, 
                        nvc.CudaVideoCodec.H264, 0)

raw_frame = np.ndarray(shape=(0), dtype=np.uint8)

with open('out.nv12', 'wb') as dec_file:
    while True:
        # Read 4Kb of data as this is most common mem page size
        bits = proc.stdout.read(4096)
        if not len(bits):
            break

        # Decode
        enc_packet = np.frombuffer(buffer=bits, dtype=np.uint8)
        if nvdec.DecodeFrameFromPacket(raw_frame, enc_packet):
            bits = bytearray(raw_frame)
            dec_file.write(bits)
            dec_file.flush()

    # Now we flush decoder to emtpy decoded frames queue.
    while True:
        if nvdec.FlushSingleFrame(raw_frame):
            bits = bytearray(raw_frame)
            dec_file.write(bits)
        else:
            break

Please check if it solves the RTSP Axis cameras issue. If not I can't help but say - do the same trick but write the output from gstreamer to pipe.

philipp-schmidt commented 3 years ago

Hi, I was literally working on the same trick with piping gstreamer output the other day but it seems you beat me to it ;)

I'm happy to test your setup as soon as I get the time. Thanks in advance.

philipp-schmidt commented 3 years ago

I'm also pretty sure this can be solved without spawning another process by using gstreamers pipeline parse function and I hope to have results here as well to share soon.

rarzumanyan commented 3 years ago

I'm also pretty sure this can be solved without spawning another process by using gstreamers pipeline parse function

Had to stick to ffmpeg because I'm mostly working under Windows. I don't like this subprocess process approach as well, it's kind of a proof of concept.

philipp-schmidt commented 3 years ago

Am I correct in assuming it doesn't matter how to segment the "packet" data being passed to the decoder? Packet kind of implies that there are alignments to take care of (to not "split" packets). The decoder will just happily take a stream of bytes? Looking at your example this seems to be the case as you align all buffers with 4096.

rarzumanyan commented 3 years ago

Video Codec SDK has built-in elementary bitstream parser and it looks like it can parse the input buffer itself.

I'm not 100% sure this is correct way to use it (feed with fixed size chunks of Annex.B data instead of Annex.B Access Units) although it seems to work. I've never recommended our customers to use Video Codec SDK API this way - but always to feed HW with Access Units and such.

Unless you come up with Gsteamer-based solution which provides you with frame-by-frame Annex.B Access Units we can use mentioned sample as workaround. I've tested it on videos on my PC and all the frames are decoded, nothing is missed.

P. S. I've asked our Video Codec SDK people which way is preferred / acceptable / wrong so I'll give you update as soon as I get word back from them.

philipp-schmidt commented 3 years ago

gst-launch-1.0 rtspsrc location="rtsp://xxx:xxx@xxx.xx.xx.xx:31055/axis-media/media.amp" protocols=tcp ! queue ! "application/x-rtp,media=video" ! rtph264depay ! h264parse ! video/x-h264, stream-format="byte-stream" ! filesink location=/dev/stdout | python GPUPipeTest.py

GPUPipeTest.py

import numpy as np
import PyNvCodec as nvc

import sys

nvdec = nvc.PyNvDecoder(int(1920), int(1080), nvc.PixelFormat.NV12, 
                        nvc.CudaVideoCodec.H264, 0)

raw_frame = np.ndarray(shape=(0), dtype=np.uint8)

with open('out.nv12', 'wb') as dec_file:
    while True:
        # Read 4Kb of data as this is most common mem page size
        bits = sys.stdin.buffer.read(4096)
        if not len(bits):
            break

        print("MEMPAGE IN")

        # Decode
        enc_packet = np.frombuffer(buffer=bits, dtype=np.uint8)
        if nvdec.DecodeFrameFromPacket(raw_frame, enc_packet):
            bits = bytearray(raw_frame)
            dec_file.write(bits)
            dec_file.flush()

Works perfectly fine with all streams I've been able to test so far!

rarzumanyan commented 3 years ago

@philipp-schmidt

Got a word back from our developers. This 4 kb-chunk approach is fine for h.264 / h.265 and wasn't tested with vp9 and av1. Although you don't have too much control over demuxing when ffmpeg / gstreamer is launched in separate process, this shall work.

stalestar commented 3 years ago

Hello，you had a great discussion. I run the code to read a video with pyav&VPF, but i want to write this video with VPF’s encoder and pyav's muxer, how could i do it ?

rarzumanyan commented 3 years ago

@lferraz

I know you're keen PyAV user, any chance you came across the topic of Annex.B packets muxing with PyAV?

stalestar commented 3 years ago

@rarzumanyan The output of EncodeSingleFrame is a ndarray struct, can it be transforming to a av.Packet ? It seems to be a feasible idea.

lferraz commented 3 years ago

Hi @hubeichenxing , I convert the ndarray into an av.Packet with this code:

        success = nvEnc.EncodeSingleSurface(surface_rgb, encFrame)
        if(success):
            pkt = av.packet.Packet(bytearray(encFrame))

of course with this approach you need to add extra info to the Packet, like pts, dts, etc.

rarzumanyan commented 3 years ago

@hubeichenxing

This PyAV sample may be a good starting point. It also takes Annex.B elementary video stream as input.

lferraz commented 3 years ago

I know you're keen PyAV user, any chance you came across the topic of Annex.B packets muxing with PyAV?

Hi @rarzumanyan , my experience with PyAV and video these last years is quite small (15 years ago I used to work with DIrectShow). Basically, now I am learning while I try to contribute to VPF in order to achieve my goals at Kognia.

Basically, since I do not understand very well the different steps what I did in #186 was a workaround to be able to mux those packets.

NOTE: during this GTC21 I've discovered DeepStream and it looks very very well.

stalestar commented 3 years ago

Thanks for response. @lferraz @rarzumanyan I try the code

pkt = av.packet.Packet(bytearray(encFrame))
pkt.pts = xxx
pkt.dts = xxx
output.mux(pkt)

It worked fine.

But i have anathor problem: I print the PacketData for VPF, it loss some pkt info of the first a few frames. I run PyAV with:

input_file = "Video118.mp4"
input_container = av.open(input_file)
in_stream = input_container.streams.video[0]

count_read = 0
for packet in input_container.demux(in_stream):
    if packet.dts is None:
        continue

    print(count_read, packet.pts, packet.dts)
    count_read += 1

The results:

0 819 -1049 4963 0
1 2045 -25 9953 0
2 3263 1225 11479 0
3 4243 2443 12533 0
4 5533 3423 14085 0
5 6665 4713 16210 0
6 7743 5845 19371 0
7 8959 6923 23699 0
8 10093 8139 27750 0
9 11239 9273 31949 0
10 12375 10419 37727 0
11 13537 11555 42812 0
12 14777 12717 45115 0
13 15897 13957 53621 0
14 17043 15077 57640 0
15 18105 16223 61025 0
16 19259 17285 67985 0
17 20485 18439 78735 0
18 21649 19665 84135 0
19 22807 20829 89981 0
20 23975 21987 95534 0
21 26307 23155 103627 1154
22 27325 24381 111265 1154
23 28581 25487 113597 1154
24 29773 26505 120074 1154
25 30849 27761 122809 1154
26 32013 28953 130964 1154
27 33209 30029 138563 1154
28 34343 31193 147742 1154
29 35355 32389 159824 1154
30 36131 33523 171629 1154
31 36965 34535 175538 1154
32 37931 35311 178404 1154
33 38799 36145 182277 1154
34 39665 37111 186383 1154
35 40595 37979 192266 1154
36 41541 38845 194511 1154
37 42351 39775 198546 1154
38 43285 40721 202322 1154
39 44113 41531 206964 1154
40 45025 42465 208756 1154
41 46095 43293 215272 1154
42 46945 44205 219006 1154
43 47727 45275 221982 1154
44 48673 46125 225643 1154
45 49571 46907 230395 1154
46 50365 47853 238247 1154
47 51279 48751 246221 1154
48 52165 49545 249761 1154
49 53031 50459 251561 1154
50 53963 51345 255262 1154
51 55113 52211 265057 1154
52 55961 53143 268533 1154
53 56707 54293 270526 1154
54 57631 55141 274642 1154
55 58445 55887 277378 1154
56 59255 56811 283321 1154
57 60259 57625 285465 1154
58 61129 58435 291147 1154
59 62059 59439 294423 1154
60 62939 60309 302311 1154
61 63865 61239 305505 1154
62 64863 62119 308147 1154
63 65667 63045 312499 1154
64 66549 64043 315638 1154
65 67405 64847 319334 1154
66 68293 65729 322317 1154
67 69259 66585 326541 1154
68 70147 67473 329665 1154
69 71061 68439 334076 1154
70 71941 69327 338505 1154
71 72871 70241 344526 1154
72 73757 71121 348455 1154
73 74635 72051 351518 1154
74 75589 72937 356988 1154
75 76499 73815 360955 1154
76 77299 74769 376680 1154

When i run with VPF:

nvDec.DecodeSingleSurface(packet_data)
if True:
     print("curr_frame:     ", frameNum)
     print("frame pts:      ", packet_data.pts)
     print("frame dts:      ", packet_data.dts)
     print("frame pos:      ", packet_data.pos)
     print("frame duration: ", packet_data.duration)
     print("")

It show:

curr_frame:      0
frame pts:       3263
frame dts:       1225
frame pos:       11479
frame duration:  0

curr_frame:      1
frame pts:       4243
frame dts:       2443
frame pos:       12533
frame duration:  0

curr_frame:      2
frame pts:       5533
frame dts:       3423
frame pos:       14085
frame duration:  0

curr_frame:      3
frame pts:       6665
frame dts:       4713
frame pos:       16210
frame duration:  0

curr_frame:      4
frame pts:       7743
frame dts:       5845
frame pos:       19371
frame duration:  0

curr_frame:      5
frame pts:       8959
frame dts:       6923
frame pos:       23699
frame duration:  0

curr_frame:      6
frame pts:       10093
frame dts:       8139
frame pos:       27750
frame duration:  0

curr_frame:      7
frame pts:       11239
frame dts:       9273
frame pos:       31949
frame duration:  0

curr_frame:      8
frame pts:       12375
frame dts:       10419
frame pos:       37727
frame duration:  0

curr_frame:      9
frame pts:       13537
frame dts:       11555
frame pos:       42812
frame duration:  0

curr_frame:      10
frame pts:       14777
frame dts:       12717
frame pos:       45115
frame duration:  0

curr_frame:      11
frame pts:       15897
frame dts:       13957
frame pos:       53621
frame duration:  0

curr_frame:      12
frame pts:       17043
frame dts:       15077
frame pos:       57640
frame duration:  0

curr_frame:      13
frame pts:       18105
frame dts:       16223
frame pos:       61025
frame duration:  0

curr_frame:      14
frame pts:       19259
frame dts:       17285
frame pos:       67985
frame duration:  0

curr_frame:      15
frame pts:       20485
frame dts:       18439
frame pos:       78735
frame duration:  0

curr_frame:      16
frame pts:       21649
frame dts:       19665
frame pos:       84135
frame duration:  0

curr_frame:      17
frame pts:       22807
frame dts:       20829
frame pos:       89981
frame duration:  0

curr_frame:      18
frame pts:       23975
frame dts:       21987
frame pos:       95534
frame duration:  0

curr_frame:      19
frame pts:       26307
frame dts:       23155
frame pos:       103627
frame duration:  1154

curr_frame:      20
frame pts:       27325
frame dts:       24381
frame pos:       111265
frame duration:  1154

curr_frame:      21
frame pts:       28581
frame dts:       25487
frame pos:       113597
frame duration:  1154

curr_frame:      22
frame pts:       29773
frame dts:       26505
frame pos:       120074
frame duration:  1154

curr_frame:      23
frame pts:       30849
frame dts:       27761
frame pos:       122809
frame duration:  1154

curr_frame:      24
frame pts:       32013
frame dts:       28953
frame pos:       130964
frame duration:  1154

curr_frame:      25
frame pts:       33209
frame dts:       30029
frame pos:       138563
frame duration:  1154

curr_frame:      26
frame pts:       34343
frame dts:       31193
frame pos:       147742
frame duration:  1154

curr_frame:      27
frame pts:       35355
frame dts:       32389
frame pos:       159824
frame duration:  1154

curr_frame:      28
frame pts:       36131
frame dts:       33523
frame pos:       171629
frame duration:  1154

curr_frame:      29
frame pts:       36965
frame dts:       34535
frame pos:       175538
frame duration:  1154

curr_frame:      30
frame pts:       37931
frame dts:       35311
frame pos:       178404
frame duration:  1154

curr_frame:      31
frame pts:       38799
frame dts:       36145
frame pos:       182277
frame duration:  1154

curr_frame:      32
frame pts:       39665
frame dts:       37111
frame pos:       186383
frame duration:  1154

curr_frame:      33
frame pts:       40595
frame dts:       37979
frame pos:       192266
frame duration:  1154

curr_frame:      34
frame pts:       41541
frame dts:       38845
frame pos:       194511
frame duration:  1154

curr_frame:      35
frame pts:       42351
frame dts:       39775
frame pos:       198546
frame duration:  1154

curr_frame:      36
frame pts:       43285
frame dts:       40721
frame pos:       202322
frame duration:  1154

curr_frame:      37
frame pts:       44113
frame dts:       41531
frame pos:       206964
frame duration:  1154

curr_frame:      38
frame pts:       45025
frame dts:       42465
frame pos:       208756
frame duration:  1154

curr_frame:      39
frame pts:       46095
frame dts:       43293
frame pos:       215272
frame duration:  1154

curr_frame:      40
frame pts:       46945
frame dts:       44205
frame pos:       219006
frame duration:  1154

curr_frame:      41
frame pts:       47727
frame dts:       45275
frame pos:       221982
frame duration:  1154

curr_frame:      42
frame pts:       48673
frame dts:       46125
frame pos:       225643
frame duration:  1154

curr_frame:      43
frame pts:       49571
frame dts:       46907
frame pos:       230395
frame duration:  1154

curr_frame:      44
frame pts:       50365
frame dts:       47853
frame pos:       238247
frame duration:  1154

curr_frame:      45
frame pts:       51279
frame dts:       48751
frame pos:       246221
frame duration:  1154

curr_frame:      46
frame pts:       52165
frame dts:       49545
frame pos:       249761
frame duration:  1154

curr_frame:      47
frame pts:       53031
frame dts:       50459
frame pos:       251561
frame duration:  1154

curr_frame:      48
frame pts:       53963
frame dts:       51345
frame pos:       255262
frame duration:  1154

curr_frame:      49
frame pts:       55113
frame dts:       52211
frame pos:       265057
frame duration:  1154

curr_frame:      50
frame pts:       55961
frame dts:       53143
frame pos:       268533
frame duration:  1154

curr_frame:      51
frame pts:       56707
frame dts:       54293
frame pos:       270526
frame duration:  1154

curr_frame:      52
frame pts:       57631
frame dts:       55141
frame pos:       274642
frame duration:  1154

curr_frame:      53
frame pts:       58445
frame dts:       55887
frame pos:       277378
frame duration:  1154

curr_frame:      54
frame pts:       59255
frame dts:       56811
frame pos:       283321
frame duration:  1154

curr_frame:      55
frame pts:       60259
frame dts:       57625
frame pos:       285465
frame duration:  1154

curr_frame:      56
frame pts:       61129
frame dts:       58435
frame pos:       291147
frame duration:  1154

curr_frame:      57
frame pts:       62059
frame dts:       59439
frame pos:       294423
frame duration:  1154

curr_frame:      58
frame pts:       62939
frame dts:       60309
frame pos:       302311
frame duration:  1154

curr_frame:      59
frame pts:       63865
frame dts:       61239
frame pos:       305505
frame duration:  1154

curr_frame:      60
frame pts:       64863
frame dts:       62119
frame pos:       308147
frame duration:  1154

curr_frame:      61
frame pts:       65667
frame dts:       63045
frame pos:       312499
frame duration:  1154

curr_frame:      62
frame pts:       66549
frame dts:       64043
frame pos:       315638
frame duration:  1154

curr_frame:      63
frame pts:       67405
frame dts:       64847
frame pos:       319334
frame duration:  1154

curr_frame:      64
frame pts:       68293
frame dts:       65729
frame pos:       322317
frame duration:  1154

curr_frame:      65
frame pts:       69259
frame dts:       66585
frame pos:       326541
frame duration:  1154

curr_frame:      66
frame pts:       70147
frame dts:       67473
frame pos:       329665
frame duration:  1154

curr_frame:      67
frame pts:       71061
frame dts:       68439
frame pos:       334076
frame duration:  1154

curr_frame:      68
frame pts:       71941
frame dts:       69327
frame pos:       338505
frame duration:  1154

curr_frame:      69
frame pts:       72871
frame dts:       70241
frame pos:       344526
frame duration:  1154

curr_frame:      70
frame pts:       73757
frame dts:       71121
frame pos:       348455
frame duration:  1154

curr_frame:      71
frame pts:       74635
frame dts:       72051
frame pos:       351518
frame duration:  1154

curr_frame:      72
frame pts:       75589
frame dts:       72937
frame pos:       356988
frame duration:  1154

curr_frame:      73
frame pts:       76499
frame dts:       73815
frame pos:       360955
frame duration:  1154

curr_frame:      74
frame pts:       77299
frame dts:       74769
frame pos:       376680
frame duration:  1154

Failed to read frame: End of file
curr_frame:      75
frame pts:       77299
frame dts:       74769
frame pos:       376680
frame duration:  1154

It loss two packet in this video, will be 1 ~ 5 in other videos. remarks：i modify some code for Tasks.cpp（333-338）

// Update the reconstructed frame timestamp;
auto p_packet_data = pImpl->pPacketData->GetDataAs<PacketData>();
//p_packet_data->pts = timestamp;
//p_packet_data->dts = 0;
//p_packet_data->pos = 0;
SetOutput(pImpl->pPacketData, 1U);

This is the url of the test video: http://viapi-customer-temp.oss-cn-shanghai.aliyuncs.com/LTAI4GFeng1pPaVrqkGSZxxd/6114f4cc-d0b2-4749-ac5a-df793cc2fd82.mp4

rarzumanyan commented 3 years ago

Hi @hubeichenxing

It loss two packet in this video, will be 1 ~ 5 in other videos.

While working on #195 I've found that sometimes after the avformat_find_stream_info() is called, file position isn't rewind to the begining (despite what's said in ffmpeg doc) which may cause first couple frame to be lost.

I've merged a quick fix from #195 to master which seeks for the frame number 0 after the demuxer is initialized to address this issue. Please check out latest master ToT to see if the issue is fixed.

stalestar commented 3 years ago

@rarzumanyan yeah, e91495a commit fix the problem for PyFFmpegDemuxer, but when i use PyNvDecoder directly, the problem still exist, the constructor function of PyNvDecoder has no upDemuxer->Flush();, i don't know why is it so.

lferraz commented 3 years ago

hi @rarzumanyan and @hubeichenxing

I found another weird issue writing packets with pyav using this strategy.

pkt = av.packet.Packet(bytearray(encFrame)) pkt.pts = xxx pkt.dts = xxx output.mux(pkt)

I've seen that though the internal frames are properly encoded and saved the packets have a wrong flag. All of them are considered keyframes.

checking a video with ffprobe -i video.mp4 -show_packets where video.mp4 was recorded with the pyav trick all the packets have flags=K_. I think this is creating artifacts playing the video.

riaqn commented 2 years ago

Hi @rarzumanyan Thank you for the example. I'm able to generate proper Annex.B packets that can be played with MPV with no issue. However, when I pass the same packets to VPF I have issues. My Code

# packet is a numpy.ndarray of uint8
    def send(self, pts, packet):
        self.f.write(packet.tobytes()) # to store the packet so MPV can play it later
        log.warning(f'got packet: pts={pts} size={packet.shape}')
        pd_out = nvc.PacketData()        
        pd_in = nvc.PacketData()
        pd_in.pts = pts

        surface = self.nvDec.DecodeSurfaceFromPacket(pd_in, packet, pd_out)
        if surface.Empty():
            log.warning('new packet not ready yet')
            return
        log.warning(f'there you go new frame@{pd_out.pts}')
        self.on_recv((pd_out.pts, surface))

This looks very canonical; however, when I send a series of packets to it, I always get new packet not ready yet, and after some iteration I get CUDA error: CUDA_ERROR_OUT_OF_MEMORY. So it seems like the decoder always think it needs more packets until it ran out of memory.

Please let me know if you want me to upload the Annex.B stream I used to test this decoder.

riaqn commented 2 years ago

@rarzumanyan Would you be so nice to take a look at my above issue when you have time? Thanks! :smile:

rarzumanyan commented 2 years ago

Hi @riaqn

I'm able to generate proper Annex.B packets that can be played with MPV

That doesn't mean your binary data is valid Annex.B elementary bitstream. Easiest way to check for compliance is to decode bitstream with reference SW decoders: JM for H.264 and HM for H.265. Both are open source and easy to build.

Alternatively you may inspect your output binary data with commercial birstream analyzers, but these are crazy expensive for end users.

riaqn commented 2 years ago

@rarzumanyan thank you for the help. I ran hm on the h265 packets and got the following outputs

riaqn@elitebook ~> TAppDecoderStatic -b out.dump -o decoded.dump

HM software: Decoder Version [16.24] (including RExt)[Linux][GCC 11.1.0][64 bit]
Note: found NAL_UNIT_FILLER_DATA with 124437 bytes payload.
POC    0 TId: 0 ( I-SLICE, QP  2 ) [DT  0.382] [L0 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 94636 bytes payload.
POC    4 TId: 0 ( P-SLICE, QP  7 ) [DT  0.273] [L0 0 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 122768 bytes payload.
POC    2 TId: 0 ( B-SLICE, QP  8 ) [DT  0.264] [L0 0 ] [L1 4 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123762 bytes payload.
POC    1 TId: 0 ( B-SLICE, QP  9 ) [DT  0.265] [L0 0 ] [L1 2 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123890 bytes payload.
POC    3 TId: 0 ( B-SLICE, QP  9 ) [DT  0.263] [L0 2 ] [L1 4 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 91164 bytes payload.
POC    8 TId: 0 ( P-SLICE, QP  7 ) [DT  0.252] [L0 4 2 0 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123259 bytes payload.
POC    6 TId: 0 ( B-SLICE, QP  8 ) [DT  0.259] [L0 4 ] [L1 8 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123242 bytes payload.
POC    5 TId: 0 ( B-SLICE, QP  9 ) [DT  0.297] [L0 4 ] [L1 6 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124349 bytes payload.
POC    7 TId: 0 ( B-SLICE, QP  9 ) [DT  0.262] [L0 6 ] [L1 8 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 90907 bytes payload.
POC   12 TId: 0 ( P-SLICE, QP  7 ) [DT  0.244] [L0 8 6 4 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123756 bytes payload.
POC   10 TId: 0 ( B-SLICE, QP  8 ) [DT  0.252] [L0 8 ] [L1 12 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124331 bytes payload.
POC    9 TId: 0 ( B-SLICE, QP  9 ) [DT  0.260] [L0 8 ] [L1 10 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123212 bytes payload.
POC   11 TId: 0 ( B-SLICE, QP  9 ) [DT  0.269] [L0 10 ] [L1 12 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 90181 bytes payload.
POC   16 TId: 0 ( P-SLICE, QP  7 ) [DT  0.238] [L0 12 10 8 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123337 bytes payload.
POC   14 TId: 0 ( B-SLICE, QP  8 ) [DT  0.250] [L0 12 ] [L1 16 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123865 bytes payload.
POC   13 TId: 0 ( B-SLICE, QP  9 ) [DT  0.249] [L0 12 ] [L1 14 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123239 bytes payload.
POC   15 TId: 0 ( B-SLICE, QP  9 ) [DT  0.250] [L0 14 ] [L1 16 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 87563 bytes payload.
POC   20 TId: 0 ( P-SLICE, QP  7 ) [DT  0.274] [L0 16 14 12 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123207 bytes payload.
POC   18 TId: 0 ( B-SLICE, QP  8 ) [DT  0.252] [L0 16 ] [L1 20 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 122969 bytes payload.
POC   17 TId: 0 ( B-SLICE, QP  9 ) [DT  0.254] [L0 16 ] [L1 18 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124387 bytes payload.
POC   19 TId: 0 ( B-SLICE, QP  9 ) [DT  0.251] [L0 18 ] [L1 20 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 88590 bytes payload.
POC   24 TId: 0 ( P-SLICE, QP  7 ) [DT  0.253] [L0 20 18 16 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123363 bytes payload.
POC   22 TId: 0 ( B-SLICE, QP  8 ) [DT  0.254] [L0 20 ] [L1 24 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123753 bytes payload.
POC   21 TId: 0 ( B-SLICE, QP  9 ) [DT  0.259] [L0 20 ] [L1 22 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124320 bytes payload.
POC   23 TId: 0 ( B-SLICE, QP  9 ) [DT  0.255] [L0 22 ] [L1 24 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 87181 bytes payload.
POC   28 TId: 0 ( P-SLICE, QP  7 ) [DT  0.257] [L0 24 22 20 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123188 bytes payload.
POC   26 TId: 0 ( B-SLICE, QP  8 ) [DT  0.262] [L0 24 ] [L1 28 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 122714 bytes payload.
POC   25 TId: 0 ( B-SLICE, QP  9 ) [DT  0.264] [L0 24 ] [L1 26 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124345 bytes payload.
POC   27 TId: 0 ( B-SLICE, QP  9 ) [DT  0.257] [L0 26 ] [L1 28 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 88511 bytes payload.
POC   32 TId: 0 ( P-SLICE, QP  7 ) [DT  0.252] [L0 28 26 24 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124349 bytes payload.
POC   30 TId: 0 ( B-SLICE, QP  8 ) [DT  0.255] [L0 28 ] [L1 32 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124434 bytes payload.
POC   29 TId: 0 ( B-SLICE, QP  9 ) [DT  0.257] [L0 28 ] [L1 30 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123163 bytes payload.
POC   31 TId: 0 ( B-SLICE, QP  9 ) [DT  0.259] [L0 30 ] [L1 32 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 86421 bytes payload.
POC   36 TId: 0 ( P-SLICE, QP  7 ) [DT  0.255] [L0 32 30 28 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123051 bytes payload.
POC   34 TId: 0 ( B-SLICE, QP  8 ) [DT  0.260] [L0 32 ] [L1 36 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123030 bytes payload.
POC   33 TId: 0 ( B-SLICE, QP  9 ) [DT  0.260] [L0 32 ] [L1 34 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123351 bytes payload.
POC   35 TId: 0 ( B-SLICE, QP  9 ) [DT  0.259] [L0 34 ] [L1 36 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 87457 bytes payload.
POC   40 TId: 0 ( P-SLICE, QP  7 ) [DT  0.254] [L0 36 34 32 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123096 bytes payload.
POC   38 TId: 0 ( B-SLICE, QP  8 ) [DT  0.257] [L0 36 ] [L1 40 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124396 bytes payload.
POC   37 TId: 0 ( B-SLICE, QP  9 ) [DT  0.256] [L0 36 ] [L1 38 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124365 bytes payload.
POC   39 TId: 0 ( B-SLICE, QP  9 ) [DT  0.253] [L0 38 ] [L1 40 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 87755 bytes payload.
POC   44 TId: 0 ( P-SLICE, QP  7 ) [DT  0.249] [L0 40 38 36 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123193 bytes payload.
POC   42 TId: 0 ( B-SLICE, QP  8 ) [DT  0.256] [L0 40 ] [L1 44 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 122917 bytes payload.
POC   41 TId: 0 ( B-SLICE, QP  9 ) [DT  0.260] [L0 40 ] [L1 42 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124321 bytes payload.
POC   43 TId: 0 ( B-SLICE, QP  9 ) [DT  0.254] [L0 42 ] [L1 44 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 89668 bytes payload.
POC   48 TId: 0 ( P-SLICE, QP  7 ) [DT  0.255] [L0 44 42 40 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123383 bytes payload.
POC   46 TId: 0 ( B-SLICE, QP  8 ) [DT  0.258] [L0 44 ] [L1 48 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123414 bytes payload.
POC   45 TId: 0 ( B-SLICE, QP  9 ) [DT  0.260] [L0 44 ] [L1 46 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124270 bytes payload.
POC   47 TId: 0 ( B-SLICE, QP  9 ) [DT  0.256] [L0 46 ] [L1 48 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 90676 bytes payload.
POC   52 TId: 0 ( P-SLICE, QP  7 ) [DT  0.254] [L0 48 46 44 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124296 bytes payload.
POC   50 TId: 0 ( B-SLICE, QP  8 ) [DT  0.258] [L0 48 ] [L1 52 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123829 bytes payload.
POC   49 TId: 0 ( B-SLICE, QP  9 ) [DT  0.260] [L0 48 ] [L1 50 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123269 bytes payload.
POC   51 TId: 0 ( B-SLICE, QP  9 ) [DT  0.260] [L0 50 ] [L1 52 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 100054 bytes payload.
POC   56 TId: 0 ( P-SLICE, QP  7 ) [DT  0.272] [L0 52 50 48 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 122874 bytes payload.
POC   54 TId: 0 ( B-SLICE, QP  8 ) [DT  0.259] [L0 52 ] [L1 56 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 124219 bytes payload.
POC   53 TId: 0 ( B-SLICE, QP  9 ) [DT  0.256] [L0 52 ] [L1 54 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 123130 bytes payload.
POC   55 TId: 0 ( B-SLICE, QP  9 ) [DT  0.262] [L0 54 ] [L1 56 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 117388 bytes payload.
POC   59 TId: 0 ( P-SLICE, QP  7 ) [DT  0.254] [L0 56 54 52 ] [L1 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 122757 bytes payload.
POC   57 TId: 0 ( B-SLICE, QP  9 ) [DT  0.257] [L0 56 ] [L1 59 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 122471 bytes payload.
POC   58 TId: 0 ( B-SLICE, QP  9 ) [DT  0.255] [L0 56 ] [L1 59 ] [:,(unk)]
Note: found NAL_UNIT_FILLER_DATA with 117792 bytes payload.
POC    0 TId: 0 ( I-SLICE, QP  3 ) [DT  0.258] [L0 ] [L1 ] [:,(unk)]
POC    2 TId: 0 ( P-SLICE, QP  7 ) [DT  0.449] [L0 0 ] [L1 ] [:,(unk)]
POC    1 TId: 0 ( B-SLICE, QP  9 ) [DT  0.380] [L0 0 ] [L1 2 ] [:,(unk)]
POC    4 TId: 0 ( P-SLICE, QP  7 ) [DT  0.529] [L0 2 0 ] [L1 ] [:,(unk)]
POC    3 TId: 0 ( B-SLICE, QP  9 ) [DT  0.420] [L0 2 ] [L1 4 ] [:,(unk)]
POC    6 TId: 0 ( P-SLICE, QP 16 ) [DT  0.563] [L0 4 2 0 ] [L1 ] [:,(unk)]
POC    5 TId: 0 ( B-SLICE, QP 13 ) [DT  0.426] [L0 4 ] [L1 6 ] [:,(unk)]
POC    8 TId: 0 ( P-SLICE, QP 16 ) [DT  0.622] [L0 6 4 2 ] [L1 ] [:,(unk)]
POC    7 TId: 0 ( B-SLICE, QP 13 ) [DT  0.436] [L0 6 ] [L1 8 ] [:,(unk)]
POC   10 TId: 0 ( P-SLICE, QP 16 ) [DT  0.617] [L0 8 6 4 ] [L1 ] [:,(unk)]
POC    9 TId: 0 ( B-SLICE, QP 12 ) [DT  0.482] [L0 8 ] [L1 10 ] [:,(unk)]
POC   14 TId: 0 ( P-SLICE, QP 18 ) [DT  0.635] [L0 10 8 6 ] [L1 ] [:,(unk)]
POC   12 TId: 0 ( B-SLICE, QP 12 ) [DT  0.531] [L0 10 ] [L1 14 ] [:,(unk)]
POC   11 TId: 0 ( B-SLICE, QP 12 ) [DT  0.450] [L0 10 ] [L1 12 ] [:,(unk)]

 Total Time:       42.626 sec.

Does it looks normal?

rarzumanyan commented 2 years ago

@riaqn

Does it looks normal?

Does the output file looks normal?

riaqn commented 2 years ago

@rarzumanyan Do you mean I should write a simple program that read the YUV file? In that case, do you know if the YUV planes are separated or interleaved? Also I assume it's yuv444? Thanks!

rarzumanyan commented 2 years ago

Hi @riaqn

Do you mean I should write a simple program that read the YUV file?

I'm not sure I get it.

HM is a reference H.265 decoder, hence it takes Annex.B elementary bitstream and decodes it to raw output file. Basically, PyNvDecoder class is functionally equivalent to HM in terms of HEVC decoding but it uses Nvdec instead of CPU.

Reference decoders are often used to check if Annex.B elementary bitstream is conformant to standard or not. Nvdec is a HW implementation of H.265 codec standard and VPF is a Python wrapper around it, hence both may potentially have bugs in their implementation. A good way to check for that is to compare PyNvDecoder output with reference decoder output.

So idea is to decode your elementary Annex.B bitstream file with HM and see if it can decode it to raw YUV file. If it can't - chances are your video file is misformed. If HM can decode it, but PyNvDecoder can't - chances are PyNvDecoder has some bugs in code.

Reference decoders such as HM and JM are thoroughly inspected and considered "golden standard", so all other decoders shall be their complete equivalents in terms of supported feature and behavior (but not in terms of performance or implementation details).

riaqn commented 2 years ago

@rarzumanyan Yes I understand what you mean. I was referring to your reply 'does the output file looks normal'. I thought to be 100% sure, I should write a program that read the YUV output file and show it as picture on my computer screen.

But roughly speaking, yes the file size looks normal. it's 3273523200 bytes, and divide that by 7680(width) and 3840(height) and 1.5(yuv420p) I got a whole number 74 (frames)

rarzumanyan commented 2 years ago

@riaqn Thanks for the update.

yes the file size looks normal

I didn't see that in HM output ;)

I got a whole number 37.

If you're not sure about pixel format, use ffprobe, it will tell you. Also I recommend you to inspect the decoded file visually just to make sure it looks sane and there are no artifacts.

I thought to be 100% sure, I should write a program that read the YUV output file and show it as picture on my computer screen.

No need to do that. There are plenty of raw video players for Windows and Linux.

NVIDIA / VideoProcessingFramework

Ability to work along side PyAV? #99