Create fast path for returning YUVPlanes to numpy arrays

hmaarrfk commented 3 months ago

Thank you for considering.

WyattBlue commented 3 months ago

The problem with this patch is that when the fast path is used, it disrupts the structure of the underlying pixel format. Any code that relies on that behavior is going to break. This was discussed in https://github.com/PyAV-Org/PyAV/pull/788

hmaarrfk commented 3 months ago

Yes. I remember now. Ultimately this function doesn't give that guarantee. It al4eady changes between a view and a copy.

Would you consider the addition of a keyword argument to specify the users intention?

WyattBlue commented 3 months ago

Any libraries that uses on to_ndarray() for VideoFrames probably without realizing it, reply on the underlying data format to be preserved. auto-editor, for example, depends on frame.to_ndarray().to_bytes() to return the underlying bytes so the ffmpeg cli can treat them as raw frames. This allows for efficient rendering videos.

What is the point of to_ndarray() being fast if it returns data in an undefined layout? Do you have a particular use case?

hmaarrfk commented 3 months ago

Any libraries that uses on to_ndarray() for VideoFrames probably without realizing it, reply on the underlying data format to be preserved.

The data "format" hasn't changed.

What has changed is the fact that VideoFrame and the returned numpy array share the same memory. If you think this is an issue, i'm happy to add different function, but the in its current form on main to_ndarray does the following

yuv420p yuvj420p -- data copy
yuyv422 -- data copy
gbrp10be", "gbrp12be", "gbrp14be", "gbrp16be", "gbrp10le", "gbrp12le", "gbrp14le", "gbrp16le" -- data copy
"gbrpf32be", "gbrpf32le" -- data copy
"rgb24", "bgr24" -- data view
"gray", "gray8", "rgb8", "bgr8" -- data view

so rgb24 and gray formats are returning views. I presume it was because it was easier to guarantee correctness then.

I'm asking here that yuv420p and yuvj420p return "maybe views" if the underlying image data is well organized (according to the checks in the if statement).

auto-editor, for example, depends on frame.to_ndarray().to_bytes() to return the underlying bytes so the ffmpeg cli can treat them as raw frames.

This should still work. I can add a test for this usecase.

What is the point of to_ndarray() being fast if it returns data in an undefined layout?

I utilize the well defined layout of the YUV420 format in FFMPEG to avoid the memory copy.

For a 8x4 array of pixels, YUV420 and YUV420J would store:

yyyyyyyy
yyyyyyyy
yyyyyyyy
yyyyyyyy
uuuuuuuu
vvvvvvvv

From wikipedia: https://en.wikipedia.org/wiki/Chroma_subsampling

So the "fastpath" checks that the memory is all contiguous, and simply returns a pointer to the underlying data from the FFMPEG VideoFrame object.

Avoiding this copy is pretty important since for 4k (or 8k! which I am interested in) video decoding:

3840 x 2160 x 1.5 = 12MB

per frame.

Copying this around unnecessarily creates REAL problems with your CPU cache and just slows things down alot.

As for my usecase, it revolves around 4k and 8k video encoding / decoding. And as such benefits from what appears to be micro optimizations.

I can add a test if you want, but generally speaking, I remember that without my check, something already failed. I recall it being due to some odd image shape in the testing suite which was interesting to learn about.

WyattBlue commented 3 months ago

It applying the fastpath does break auto-editor though: mpv-shot0001

It wouldn't be an issue as long as the fastpath is not applied by default. I'm sympathetic to your use-case. I do think having a new default argument is a little janky, with the **kwargs already there.

I think the way forward is make a new method to_ndarray2(). It would handle the yuv420p yuvj420p like your code does now. Making it a new method has the benefit of side-stepping any backwards compatibility problems.

hmaarrfk commented 3 months ago

It applying the fastpath does break auto-editor though:

hmaarrfk commented 3 months ago

It applying the fastpath does break auto-editor though:

This is really strange....

I find on place where you use the "naked" to_ndarray https://github.com/WyattBlue/auto-editor/blob/8f11ab54fade4fa5ab4cd7cb43ae54a4212dbff3/auto_editor/analyze.py#L392

but nothing jumps to me as getting mutated.

(In fact you could save yourself some time by doing current_frame = frame.to_ndarray().astype('int16') but that is beside the point)

I would love to recreate that screenshot to see what might be going on?

I have a feeling that maybe I'm not adding enough safeguards on the fastpath.

Can you teach me how you got to your screenshot?

hmaarrfk commented 3 months ago

The other place is https://github.com/WyattBlue/auto-editor/blob/8f11ab54fade4fa5ab4cd7cb43ae54a4212dbff3/auto_editor/render/video.py#L370 but also seems "safe"

hmaarrfk commented 3 months ago

I suspect the reason that I don't see this bug it is because you are making use of elaborate filters and using filter graphs which is not what workflow i go through.

I added two checks for data continuity:

                y_plane._buffer_ptr + y_plane.buffer_size * bytes_per_pixel == u_plane._buffer_ptr and
                u_plane._buffer_ptr + u_plane.buffer_size * bytes_per_pixel == v_plane._buffer_ptr

I think they should resolve things, but I need to check that it is still "correct" in the sense that the fast path is still triggered.

hmaarrfk commented 3 months ago

I need to recheck my buffer continuity check, it doesn't seem to be triggering "true" on what I thought it should.

hmaarrfk commented 3 months ago

Hmmm, it seems that the array isn't guaranteed to be contiguous, even in the "nice" cases

ipdb> u_plane.buffer_ptr - (y_plane.buffer_ptr + y_plane.buffer_size)
4864
ipdb> v_plane.buffer_ptr - (u_plane.buffer_ptr + u_plane.buffer_size)
1728

how is any of my code working.....

hmaarrfk commented 3 months ago

it seems that maybe the function i'm looking for is: get_buffer2 that would allow me to create the buffer that I need to ensure that memory isn't copied. https://ffmpeg.org/doxygen/6.0/structAVCodecContext.html#aef79333a4c6abf1628c55d75ec82bede

hmaarrfk commented 3 months ago

Thank you for your thoroughness in testing!

PyAV-Org / PyAV

Create fast path for returning YUVPlanes to numpy arrays #1393