Transcoding != returning Julia-formatted images

galenlynch commented 3 years ago

I've been digging into the codebase for both encoding and decoding, in an effort to add support for 10 bit videos. The API for reading videos seems to conflate 'transcoding' with transferring frame buffers into Julia arrays with popular Julia element types such as RGB{N0f8}. In addition to populating arrays with Julia element types, 'transcoding' in VideoIO also seems to mean color conversion and scaling, as performed by libswscale.

As it stands, if you set transcode=false when creating a VideoReader object, you get neither behavior and subsequent calls to read or retrieve will yield byte buffers of each image, instead of a Julia matrix with a RGB or Gray element type. Similarly, if transcode=true, it is possible to run into problems if you either request a pixel type that does not directly map onto Julia element types, or instead if you have a video that does not require colorspace conversion before transferring into a Julia array. In the first example, if you call retrieve with a VideoReader{TRANSCODE} object and a pixel type that is not supported, then VideoIO prompts users to allocate their own image buffers and call retrieve!, even though all that would happen in that situation (if it worked at all) would be simple transferring of the bytes into the user-supplied buffer, and in reality only a portion of that image since the current implementation doesn't seem to handle planar image formats. It is very unlikely that the memory layout of the user-supplied buffer would match the pixel format of the transcoded frame. I think this was the confusion behind #210.

It seems to me that we should divorce giving the user a raw byte buffers from whether VideoIO does color-space conversion with sws-scale. A power user might want to use VideoIO to read a video and do colorspace conversion (which requires 'transcoding' with sws-scale), and can then use their own methods to make sense of the byte buffer. Conversely, setting transcode=true and requesting a pixel format that VideoIO does not know how to convert into a RGB or Gray element type currently seems broken, even if the user supplies their own array.

Perhaps we could make separate functions retrieve_raw, and retrieve_raw! that returns the bytes of either the original frame, or instead the scaled frame, depending on whether transcode=true. The retrieve and retrieve! functions should probably just error if the image buffer does not line up with a supported pixel type, whether that pixel type is the result of sws_scale colorspace conversion or instead is the pixel type of the original video. transcode=true could just determine whether colorspace / pixel format conversion with sws-scale is performed.

timholy commented 3 years ago

An alternative, of course, is to add support for whatever colorspace you need to ColorTypes. Note that FixedPointNumbers has an N6f10 type, so as long as it gets expanded to 16 bits per color channel, the support is already there. If it's a "packed 10 bits" then more work needs to be done, but it really shouldn't be that bad.

galenlynch commented 3 years ago

Right, but often the memory layout of the frame format isn't easily mapped onto the memory layout of a Julia matrix. For example, to add RGB{N6f10} support, the simplest solution (at least that I could think of) is to use the AV_PIX_FMT_GBRP10LE format, which is a planar 10 bit rgb format. There weren't any 'packed' (meaning interleaved, not how you used packed) 10 bit RGB formats afaik. Transferring the frame data into a RGB{N6f10} matrix therefore requires reading from the same position of each plane of the ffmpeg frame, and assembling them into a RGB element for the Julia array. I think only in a few (already supported) cases such as GRAY8 and RGB24 is there a simple correspondence between the memory layout of the target Julia array and the memory layout of the ffmpeg frame, which allows for simple transferring of bytes from the ffmpeg frame into the output.

galenlynch commented 3 years ago

Also looking at the code a little bit more closely, it seems like calling read with NO_TRANSCODE in fact does nothing at the moment:

https://github.com/JuliaIO/VideoIO.jl/blob/600f60cb33fbb5836138524ceb5c71d3b2e116cf/src/avio.jl#L469-L479

timholy commented 3 years ago

In principle, planar formats can be handled with ImageCore.colorview, but I don't know the specifics of AV_PIX_FMT_GBRP10LE so I can't comment further.

galenlynch commented 3 years ago

Hmm that's interesting, I will explore that. But the point I'm trying to make is that in general we probably can't expect the memory layout of ffmpeg frames to match memory layout of Julia arrays, unless if we perhaps make a colorview for each ffmpeg pixel format we want to support.

galenlynch commented 3 years ago

In which case we would still need to throw an error if read is called with a pixel format with no corresponding colorview.

kmsquire commented 3 years ago

FWIW, I wrote the original code, and wholeheartedly agree that the logic for reading and transcoding should be separated, along with the suggestion to throw errors for unsupported pixel formats.

It’s also quite possible that the NO_TRANSCODE code path is non functional. It’s been a long time, but I vaguely remember a sticking point being how to handle multiple planes with different sizes with some formats.

I haven’t had a need for this coffee in a long while, so others have taken up maintenance. Any improvements you could offer would be greatly appreciated.

galenlynch commented 3 years ago

@ianshmean I would also vote for this to be addressed prior to v1. I think that sws_scale does not always need to be used when returning a Julia formatted array to the user. I also think calling color space conversion with sws_scale "transcoding" is not entirely accurate.

JuliaIO / VideoIO.jl

Transcoding != returning Julia-formatted images #269