Open rcode6 opened 4 months ago
Hi @rcode6
For now VALI doesn’t support demuxing.
I understand it’s usefulness for advanced users but unfortunately it leaves too many ways to e. g. break decoder internal state with seek to particular packet or to overflow vRAM when sending packets with multiple frames in each but receiving frames one by one.
I’m not against the whole demuxing idea, I just don’t have clear understanding of how to do it right. BTW can it be done with something like PyAV? Muxing isn’t computationally expensive and it doesn’t have HW support anyway, so any other alternatives to VALI may be just as good.
Hi @RomanArzumanyan,
You're right that demuxing can be done with other libraries, but my goal is to still have the latter parts of the pipeline all on the GPU: decoding, pixel format conversions & resizing. That's a bit harder to manage. PyAV does demux into packets, but decoding always ends up on the cpu. And Nvidia's spinoff of VPF, PyNvVideoCodec doesn't do surface conversions or resizing.
I completely respect your decision to keep it simpler for users though! Would love it if you'd consider otherwise, since it's already happening behind the scenes.
I think, from reading PyNvVideoCodec's very sparse documentation, that they do support dlpack, so maybe I can use PyNvVideoCodec for demuxing & decoding, then use from_dlpack to jump into VALI for the rest of the work.
Hi @rcode6
Could you share your demux / decode use case? As far as I understand, the only difference between builtin and standalone modes in PyNvCodec
was the ability to extract elementary stream and seek through the packets.
Hi @RomanArzumanyan,
My project processes multiple live camera stream inputs for a security system, so demuxing and timestamping the feeds quickly and with minimal latency is important, and then packets are placed into a queue for slower decoding/processing later.
The processing thread pulls packets off the queue, then decodes the packets into frames, and does pixel format conversions and resizing before passing them on for further processing work. All that work is also done entirely on the gpu, without ever downloading to the host. There's a combination of selective image processing, motion detection, and object detection going on there, followed by possibly recording back to disk. This queue will fall behind for periods of time and catch back up later.
The primarily gain I get from the demuxing process is that I can keep the unprocessed video as packets on the host, which are basically the compressed video feed. If I were to instead decode earlier instead of just demuxing, the unprocessed video ends up being stored in the queue as decompressed surfaces in vram.
Another place I can get the same type of space savings is that if I want to keep the last 10 seconds of video footage in memory before a recording event, I can also just keep them as compressed packets on the host, instead of decompressed on the gpu. For instance, with a 30fps feed, that's 300 frames I can keep compressed to decode later, vs using up vram for 300 uncompressed surfaces in whatever resolution the feeds were in. So far, I've found that the extra load of periodically decoding packets twice to be minimal, while the vram savings have been huge.
In a nutshell, my demuxing use case just lets me save a lot of vram while avoiding to have to download/upload from device/host too much.
Thanks for the reply @rcode6
As far as I understand, one missing thing is the ability of PyDecoder
to take some sort of file handle (or AVPacket
directly) as input.
The rest is can be done with e. g. PyAV which will write demuxed packets to some queue. Am I missing something?
Hi @RomanArzumanyan,
Yes, if there was a way for PyDecoder
to take in AVPacket
directly, that would work. Basically what DecodeSingleSurfaceFromPacket
used to do.
I think the problem is what an AVPacket
object in python would look like for PyDecoder
. From my understanding, AVPacket
isn't just the raw byte data for a single packet, it's also been parsed (pts, dts, flags, etc). If demuxing using PyAV, it generates av.packet.Packet objects. The raw bytes look to be exposed along with the individual properties, but then dts, pts, etc would need to be manually copied over (which sounds prone to a lot of user error, or possible struct changes), unless PyDecoder
demuxes the raw bytes again (which negates the purpose of using another library).
I can't seem to think of an efficient way other than the same library being used for the demux and decode steps, with the AVPacket
objects being exposed in between for the user to hold onto. Otherwise, using from_dlpack seems like the best option at the moment as a handover step if a separate library needs to be used.
Upvoting for pyAV compatibility. Would be fantastic if we could directly décode AVpackets (and retrieve them from encoder). At some point in VPF, what was lacking was bitstream filters in pyAV, but it seems that these have been incorporated now.
Thank you so much for keeping this project going as VALI!
I'm currently using VPF in my project and was working on swapping over to VALI, but it looks like you've removed demuxing into packets & decoding from packets. Is that something you'd consider adding back in at some point? It's currently a large part of my processing pipeline.