Demuxer/DecodeSurfaceFromPacket coming back at some point?

rcode6 commented 4 months ago

Thank you so much for keeping this project going as VALI!

I'm currently using VPF in my project and was working on swapping over to VALI, but it looks like you've removed demuxing into packets & decoding from packets. Is that something you'd consider adding back in at some point? It's currently a large part of my processing pipeline.

RomanArzumanyan commented 3 months ago

Hi @rcode6

For now VALI doesn’t support demuxing.

I understand it’s usefulness for advanced users but unfortunately it leaves too many ways to e. g. break decoder internal state with seek to particular packet or to overflow vRAM when sending packets with multiple frames in each but receiving frames one by one.

I’m not against the whole demuxing idea, I just don’t have clear understanding of how to do it right. BTW can it be done with something like PyAV? Muxing isn’t computationally expensive and it doesn’t have HW support anyway, so any other alternatives to VALI may be just as good.

rcode6 commented 3 months ago

Hi @RomanArzumanyan,

You're right that demuxing can be done with other libraries, but my goal is to still have the latter parts of the pipeline all on the GPU: decoding, pixel format conversions & resizing. That's a bit harder to manage. PyAV does demux into packets, but decoding always ends up on the cpu. And Nvidia's spinoff of VPF, PyNvVideoCodec doesn't do surface conversions or resizing.

I completely respect your decision to keep it simpler for users though! Would love it if you'd consider otherwise, since it's already happening behind the scenes.

I think, from reading PyNvVideoCodec's very sparse documentation, that they do support dlpack, so maybe I can use PyNvVideoCodec for demuxing & decoding, then use from_dlpack to jump into VALI for the rest of the work.

RomanArzumanyan commented 3 months ago

Hi @rcode6

Could you share your demux / decode use case? As far as I understand, the only difference between builtin and standalone modes in PyNvCodec was the ability to extract elementary stream and seek through the packets.

rcode6 commented 3 months ago

Hi @RomanArzumanyan,

My project processes multiple live camera stream inputs for a security system, so demuxing and timestamping the feeds quickly and with minimal latency is important, and then packets are placed into a queue for slower decoding/processing later.

The processing thread pulls packets off the queue, then decodes the packets into frames, and does pixel format conversions and resizing before passing them on for further processing work. All that work is also done entirely on the gpu, without ever downloading to the host. There's a combination of selective image processing, motion detection, and object detection going on there, followed by possibly recording back to disk. This queue will fall behind for periods of time and catch back up later.

The primarily gain I get from the demuxing process is that I can keep the unprocessed video as packets on the host, which are basically the compressed video feed. If I were to instead decode earlier instead of just demuxing, the unprocessed video ends up being stored in the queue as decompressed surfaces in vram.

Another place I can get the same type of space savings is that if I want to keep the last 10 seconds of video footage in memory before a recording event, I can also just keep them as compressed packets on the host, instead of decompressed on the gpu. For instance, with a 30fps feed, that's 300 frames I can keep compressed to decode later, vs using up vram for 300 uncompressed surfaces in whatever resolution the feeds were in. So far, I've found that the extra load of periodically decoding packets twice to be minimal, while the vram savings have been huge.

In a nutshell, my demuxing use case just lets me save a lot of vram while avoiding to have to download/upload from device/host too much.

RomanArzumanyan commented 3 months ago

Thanks for the reply @rcode6

As far as I understand, one missing thing is the ability of PyDecoder to take some sort of file handle (or AVPacket directly) as input.

The rest is can be done with e. g. PyAV which will write demuxed packets to some queue. Am I missing something?

rcode6 commented 3 months ago

Hi @RomanArzumanyan,

Yes, if there was a way for PyDecoder to take in AVPacket directly, that would work. Basically what DecodeSingleSurfaceFromPacket used to do.

I think the problem is what an AVPacket object in python would look like for PyDecoder. From my understanding, AVPacket isn't just the raw byte data for a single packet, it's also been parsed (pts, dts, flags, etc). If demuxing using PyAV, it generates av.packet.Packet objects. The raw bytes look to be exposed along with the individual properties, but then dts, pts, etc would need to be manually copied over (which sounds prone to a lot of user error, or possible struct changes), unless PyDecoder demuxes the raw bytes again (which negates the purpose of using another library).

I can't seem to think of an efficient way other than the same library being used for the demux and decode steps, with the AVPacket objects being exposed in between for the user to hold onto. Otherwise, using from_dlpack seems like the best option at the moment as a handover step if a separate library needs to be used.

Yves33 commented 1 month ago

Upvoting for pyAV compatibility. Would be fantastic if we could directly décode AVpackets (and retrieve them from encoder). At some point in VPF, what was lacking was bitstream filters in pyAV, but it seems that these have been incorporated now.

RomanArzumanyan / VALI

Demuxer/DecodeSurfaceFromPacket coming back at some point? #55