Fix dropped frames when reading, encoding, add 10 bit support, and more

galenlynch commented 3 years ago

I apologize in advance for the vague title of this PR. While this PR started as a simple change to add support for encoding and decoding 10 bit gray videos, it expanded in scope as I ran into other bugs lurking in the code base that prevented forward progress on adding 10 bit support. This is definitely a work in progress: while the functionality is 100% complete, the documentation and testing are definitely not. Help would be appreciated.

I will attempt to summarize the changes made in this PR, and where possible motivate each change with a bug that it fixes or code that it simplifies. I will no doubt need to expand upon this list, as many parts of the encoding and decoding pipeline have been touched.

Possible conflict with master

In rebasing onto master, there was a merge conflict with the changes made in PR #277. I believe that the PR here also fixes the same problem, but in a slightly different way as this PR refactors some of the underlying code. I am not 100% positive that my changes will also work for @dawbarton, so I would appreciate his attention/review. While some of the difference is just a question of refactoring, I believe that there is also a semantic change between this PR and #277: in #277 I believe only the first image plane is transferred, while in this PR all image planes would be transferred. This difference would be important for many multi-planar formats, e.g. YUV formats, that are popular for webcams. I am not sure that #277 would work for such multi-planar formats. One nuance that should probably be discussed is that returning the raw image planes from FFMPEG may not be very helpful without the alignment that is used for the buffer. This is made more complicated by the fact the original code used a deprecated method that was not aware of this buffer alignment.

Prevent dropped frames by not creating a damaged MPEG-2 stream

Frames were being dropped because the intermediate MPEG-2 elementary stream file was missing a start code, as documented in #271. I have fixed this bug for the video encoding workflow that uses VideoEncoder, but it is not relevant for the new VideoWriter file and workflow that is part of this PR.

Prevent final encoded frame from being dropped due to it having zero duration

The final frame of encoding streams are often present in the resulting video container, but not shown due to an edit list. This has the effect of dropping the final frame of an encode, though it was present in the video stream. After much head scratching, it turns out that this was because the packet duration being fed into the muxer was zero, and libav would occasionally hide it as a result. This problem, and the fix, is documented on this stack overflow question that I posted.

Come to think of it, this may not have been a bug in the original implementation, and instead fixes a bug that plagued my PR for a while. However, the files being produced by VideoIO had strange timestamp durations that ffmpeg would complain about at high verbosity levels, and that some documentation I read online suggests may become unsupported at some point.

Start encoding frames at the PTS expected by libav: 0

The function encodevideo feeds frames into the encoder starting at timestamp 1. This produces videos that have strange edit lists, and may also result in surprising video-audio synchronization results when adding audio tracks to video streams produced by VideoIO. I have NOT changed the behavior of encodevideo in this PR, but function that is the equivalent to encodevideo for the new encoding workflow that I add to this PR, encode_mux_video, frames start at timestamp 0, which seems to be what libav expects. This produces videos that do not have strange edit lists, since this video stream does not need to be delayed some number of samples. I think this is also consistent with what users would expect when adding audio tracks: video and audio would start at the same time, instead of the video being delayed by one frame period.

Prevent dropped frames while reading by flushing decoder

The decoding loop for VideoReader/AVInput objects would not flush the decoder once all packets were read from a video container. In my hands with x264 streams in mp4 files, this would result in the final two frames from a video not being produced by VideoIO. This PR properly flushes the decoder at the end of a file to drain any frames buffered by the decoder.

Use libav functions to set private and non-private encoder options

For the new encoding workflow that I have added as part of this PR, I use libav's interface to set both encoder settings and private encoder settings. Currently, VideoIO sets encoder settings by modifying the struct field names listed in the AVCodecContextProperties keyword argument. This change means that the names of encoder settings matches the names used in the command-line documentation instead of the names of fields in the libav C struct. I think it is perhaps easier for users to access the ffmpeg command line documentation instead of the developer documentation. It also means the error checking on non-private setting names is done by libav, where as currently no error checking is done.

Encoder options are set with the keyword arguments encoder_settings and encoder_private_settings, instead of keyword argument AVCodecContextProperties. I also use named tuples rather than arrays of pairs, which I feel is more ergonomic. So instead of doing encodevideo(..., AVCodecContextProperties = [:priv_data => ("crf"=>"22", "preset"=>"medium")]), users would instead do encode_mux_video(..., encoder_private_settings = (crf = 22, preset = "medium")).

Make seeking consistent

Seeking currently conflates PTS and DTS, as described in #275, and uses a strange offset when seeking into a file. In this PR PTS is used instead of DTS. This also makes seek and gettime have the same notion of time.

Change encoding workflow

Currently, the encoding workflow in VideoIO encodes frames, writes the encoded frames to a MPEG-2 elementary stream, and then using the FFMPEG binary to mux that stream into a mp4 container. This workflow uses the VideoEncoder type to store all of the necessary pieces to encode a video file. I have added a new workflow that uses libav's AVFormat API, that will mux and write video packets as they are produced, instead of creating an intermediate file. This also has the benefit of being more general down the road, and also allowing the encoder to be properly configured for the video container format (which is not currently being done). By relying on the AVFormat API, I think it would be quite easy to add support for other container formats, while still checking that the encoder chosen by the user is compatible with the chosen format, etc.

This new workflow has a simple correspondence to the existing workflow in VideoIO, and does not require a sepearte mux step.

Old workflow	new workflow
`encode_video`	`encode_mux_video`
`prepareencoder`	`open_video_out!`
`appendencode!`	`append_encode_mux!`
`finish_encode!`	`close_video_out!`
`close(io)`	N/A
`mux`	N/A

I have not removed the old workflow, and indeed have fixed a few bugs with it. I personally think the new workflow is better than the old, and should at least be the default. However, whether the existing workflow should be removed is good question for discussion.

Add support for 10 bit gray and RGB decoding

You can now decode videos that have 10bit YUV encoding, receive them either as RGB or Gray buffers.

Add support for 10 bit gray and RGB encoding

You can now encode videos that have 10bit YUV encoding, and write them as either Gray or RGB buffers.

Add support for encoding frames that are "scanline major"

Most video packages expect pixels with neighboring x position and the same y position, that is pixels adjacent on the same scanline, to be adjacent in memory. Encoding frames that have this memory layout required the use of PermutedDimsArray with VideoIO. I have added a keyword argument, scanline_major, that allows the use of this pervasive memory format in VideoIO when encoding videos. Indeed, libav itself uses this format. As such, encoding video frames that are scanline major requires less work on VideoIO's end, and can be as simple as copying memory.

Eliminate use of deprecated functions in libav

I have removed calls to deprecated libav functions and replaced them with the recommended alternatives.

removed deprecated function	replacement function
`avpicture_get_size`	`av_image_get_buffer_size`
`avcodec_decode_video2`	`avcodec_send_packet` and `avcodec_receive_frame`

Replacing avcodec_decode_video2 required changing the logic of the decoding loop, in order to handle EAGAIN errors as per the libav documentation.

There is one more deprecated function that I replaced, but I am having a hard time finding which function that was.

Make implicit memory layout requirement of decoding pipeline explicit

The video decoding pipeline seemed to require that the frame buffer supplied by users by scanline-major, as described above, but did not require that the array be indexed one way or another. I have made this requirement explicit, and disabled some code that I think would have errored previously because it violated this assumption.

Use method dispatch in favor of if-else statements where possible

Instead of big if-else statements that match user-supplied buffer type with pixel format, and scale poorly with the number of buffer and pixel types supported, I have moved much of this logic into method dispatch. The type of the user-supplied buffer now uses method dispatch, instead of if-else statments, when encoding and decoding video.

Refactor repeated code snippets into functions

Some code snippets, such as "pumping" a video container for frame data, were repeated many times in many different functions. I have tried to refactor this snippets into a single function where I noticed them.

Make decoding image buffer compatible with multi-planar image formats

The image buffer used by the decoding logic to handle containers with multiple video streams did not account for the use of multi-planar image formats. I have used libav functions to transfer the image buffers to the VideIO maintained image buffer, which allows it work with multi-planar image formats. I am unclear if this image buffer is ever used, but at least it's now not wrong.

New interface to interact with nested C structs in libav

One large change in this PR is that it introduces a new type, NestedCStruct, and associated methods for to make interacting with nested C structs require less code in Julia, and additionally fixes some potential memory bugs in the existing code. This type and related functions are in the new file avptr.jl. Interacting with libav (FFMPEG) requires receiving, modifying, and passing structs that are part of the API of that library. Interacting with these C structs requires modifying their fields and generating pointers to them. The existing codebase would often store these structs as length 1 arrays of Julia structs that match the memory layout of the C struct. However, accessing this memory and modifying individual fields would often boil down to generating pointers to the array, and then modifying fields of the Julia struct directly with pointer logic, requiring many unsafe_store! and unsafe_load statements throughout the codebase, without any calls to GC.@preserve etc to ensure that the underlying memory was not garbage collected before the pointer was used. Additionally, even if the GC.@preserve macros were added in the right places, by my reading of Julia's semantics on getting pointers to fields of Julia structs is that it's not supported, and while the existing code might work at the moment there is no guarantee that future compiler or garbage collector optimizations will not break it.

Beyond the potential memory bugs, interacting with these c structs would require a fair amount of boiler-plate code, which I suspect led to some C structs of interest (e.g. AVStream) being cached as fields of multiple mutable structs in VideoIO. This meant that there was no single authoritative source of a piece of information. For example, the contents of StreamInfo in the current code base is entirely redundant with the VideoReader struct. If accessing and manipulating the contents of nested C structs were easier, I suspect that there would be less redundant information the current types of VideoIO.

My solution, NestedCStruct, simply acts like a typed double pointer to a C struct. It does not "own" the underlying memory, though it can be used with finalizers to automatically manage memory that is dynamically allocated by libav. Reading fields from the C struct, and modifying them, is automatically accomplished using Julia's reflection capabilities, and is also done so in a way that (I believe) is memory-safe and protects the memory from being prematurely garabge collected. This means that functions in VideoIO do not need to invoke unsafe functions (unsafe_store!, unsafe_convert, etc) to manipulate the C struct. Interaction with C functions are relatively painless due to unsafe_convert functions that will provide double or single pointers to the wrapped C struct as required by the C function's type signature. Importantly, because the memory is internally represented as a pointer, and not a Julia struct, this design no longer relies on Julia behavior that works for now but is not actually supported.

This type, and the functions that go along with it, is likely a subset of Blobs.jl. Whether VideoIO wants to maintain its own code to access nested C structs, or instead wants to add a dependency, is a good question for discussion. Since I was able to make this solution work, and IMO be very ergonomic to work with, in ~82 LOC (with lots of spacing) I think it might not be worth it to add a new dependency.

I have additionally added a macro, @avptr, which makes it easy to assign an alias to a NestedCStruct that wraps a particular C struct, and additionally make constructors that call allocating c functions, and attach finalizers to the resulting object that call the appropriate destructor. This allows types in VideoIO to have fields of types that have similar names to the struct name in libav, for example AVFramePtr which is just an alias for NestedCStruct{AVFrame}. It also reduces the amount of boiler plate required to use libav's functions to allocate the C struct, and to free it when Julia no longer uses it.

The consequence of this new type is eliminating a lot of boiler plate across VideoIO necessary to interact with C structs, eliminating redundant fields in many of VideoIO's types, and eliminating some types all together (e.g. StreamInfo). Additionally, accessing C structs through multiple layers of nesting in Julia looks much more like accessing these structs in C, except that the -> operator is replaced by a . in Julia code. In contrast, doing the same in the existing code base would require (and often did) using unsafe_load at each layer of nesting, and would further require the use of GC.@preserve at multiple stages to avoid memory bugs (although that was absent in the code).

Use properly initialized libav structs

AVFrames in VideoIO were not initialized by libav, but instead were initialized by Julia. This results in many of the fields being initialized to invalid, or non-default values from the perspective of libav. While it is unclear if passing these improperly configured structs to libav was causing any problems, some behavior of VideoIO relied on the default value assigned by Julia, and not the default value assigned by libav.

Reduce type instability

Some of VideoIO's types, had abstract field types that did not need to be abstract. For example, VideoReader's field avin was type AVInput, which is abstract. I have parameterized (and made concrete) the type of these fields where possible, which reduces type instability in the generated code.

Use locks around functions that are not thread safe

Some libav functions, such as avcodec_open2, are not thread-safe according to the libav documentation. I have added a constant global ReentrantLock, VIDEOIO_LOCK, to VideoIO and used it to prevent multiple Julia threads from using non-thread-safe functions.

closes #270, closes #271, closes #274, closes #275, closes #242, closes #283, closes #284, closes #285

galenlynch commented 3 years ago

Also the tests currently fail in part because the tests do not expect all frames to survive a round trip through VideoIO. I will work on writing tests for the new code, and updating tests to no longer expect the wrong result. However, I would definitely appreciate some help, if you have any time.

dawbarton commented 3 years ago

Thanks for the heads-up. I'll try to take a look at this later today (maybe later in the week). An initial comment is that the solution I implemented in #277 is pretty basic; it should transfer all the image planes (if I understand the terminology correctly - I might not as I'm quite new to this) but it does so in a very simplistic way in that it just dumps the entire frame (including all image planes) into a since Vector{UInt8}. The user then has to manually separate out the different image planes.

It looks like your solution is a lot more comprehensive than mine and so I'd be happy if my changes were undone assuming that I can still get hold of the untranscoded frames somehow.

IanButterworth commented 3 years ago

I haven't yet reviewed the code yet, but the intentions and fixes sound excellent. Thank you.

I'm not against API changes in favor of more logical designs. Perhaps we should start a roadmap to version 1

galenlynch commented 3 years ago

It might work in practice for multi-planar image formats, but I'm not sure it will work in general. An AVPicture in ffmpeg is described by four or more pointers, each of which points to a different plane of an image. How many image planes are used to store a AVPicture depends on the pixel format. Your code, as far as I can tell, uses only the first pointer, and copies data from that pointer into a buffer owned by Julia that is returned to the user. While this is semantically ignoring the other image planes in the trailing pointers, you are also copying more data than is just in the first image plane, because you use avpicture_get_size to get the image size in bytes, which should also include the other image planes. I suspect that the other data pointers in a AVPicture often point to a contiguous piece of memory that contains all the image planes (and are hopefully in the right order here?), so doing this may be successful even if I don't think it's explicitly considering the other image planes. At least semantically, you run the risk of reading more data than is stored in the first pointer (possible segfault?), and not copying the entire AVPicture data. I couldn't find documentation that explicitly describes how AVPicture data is internally organized, and if all the data has to be contiguous. Instead of trying to figure out the internal arrangement of AVPicture data buffers, I just used the functions provided by libav to copy image data to buffers.

dawbarton commented 3 years ago

That definitely sounds like a better way of doing things. I have to admit that I was relying on the fact that the code path I was working on didn't work in the first place and so something is better than nothing. I started working my way through the FFMPEG API yesterday and was rapidly coming to the conclusion that things were not done in an ideal way.

galenlynch commented 3 years ago

@ianshmean starting a roadmap to v1 sounds like a good idea. I have some other suggestions for breaking changes that I think would be an improvement to VideoIO's API, but will wait for such a road map to propose them.

galenlynch commented 3 years ago

OK the tests are now passing, although they do not cover the new functionality. Unfortunately the old encoding workflow is still dropping two input frames, although that is an improvement from the four dropped frames before this PR. I think at one point I was managing to get every frame into the output file with the old workflow. Under the new workflow, all frames make it into the output. A brief look at the problem suggests that this is once again a problem with the edit lists: all 100 packets seem to be in the output, but two of them are hidden. FFMPEG is also complaining that the timestamps are not set:

[mp4 @ 0x5600eded9080] Timestamps are unset in a packet for stream 0. This is deprecated and will stop working in the future. Fix your code to set the timestamps properly
[mp4 @ 0x5600eded9080] pts has no value

I don't know if it's easy to work around this with the intermediate MPEG-2 elementary stream in the old workflow, since the packet contents are just dumped into a file.

I'm in favor of replacing the old workflow with the new one that uses the AVFormat API to encode and write files. It doesn't have these sorts of bugs. I don't really know how to proceed to try to fix these remaining two dropped frames with the old workflow, and honestly I don't have the time to chase it down, judging from my last attempts to fix similar edit-list problems. Currently in the PR the encoding code is nearly duplicated to support both workflows.

galenlynch commented 3 years ago

I have added some simple tests for the new encoding workflow. I noticed in the course of writing these tests that I am unable to achieve lossless encoding with 10 bit monochrome videos, although they are close to the lossless target. I will try to figure out the cause of this bug soon.

I have also marked the tests that check the number of frames in the old encoding workflow as broken, instead of allowing them to drop two frames and still pass.

IanButterworth commented 3 years ago

unable to achieve lossless encoding with 10 bit monochrome videos

It may be worth checking that the correct flags are set in the FFMPEG builder https://github.com/JuliaPackaging/Yggdrasil/blob/master/F/FFMPEG/common.jl

galenlynch commented 3 years ago

Still not able to get lossless 10 bit encoding to work as expected. The output seems mostly lossless, with occasional runs of off-by-one errors. I wonder if this is due to color space conversion being done by FFMPEG? Currently in this PR, encoding gray frames uses the GRAY10LE pixel format. However, x264 stores the result in a YUV format. It's possible that whatever color-space conversion that FFMPEG uses to make the gray input into a YUV output results in rounding errors. This could explain the off-by-one errors that I'm seeing. I'll try encoding 10 bit gray videos with a YUV input pixel format and see if that fixes it.

galenlynch commented 3 years ago

When I encode directly with YUV420P10LE I see the same pattern of off-by-one errors 🤔. Maybe I have some encoder settings wrong?

@ianshmean AFAIK h264 codec now supports both 8 and 10 bit encoding without requiring different build flags, but I might be wrong.

galenlynch commented 3 years ago

I see similar problems if I attempt to do the same with my system ffmpeg binary. I'm beginning to think the "problem" isn't with VideoIO or this PR, and is more likely due to my inability to properly configure the encoder.

IanButterworth commented 3 years ago

This is wonderful progress @galenlynch!

In my experience, windows CI doesn't like it if files that it was using are rm-ed too quickly. A way to fix the windows tests might be to either use mktemp and let the system tidy up itself later, or move all the rm's to the end, or just not rm at all and overwrite

galenlynch commented 3 years ago

Weird, I don't think I changed the tests that are failing on windows. I'll see what I can do about the windows errors.

galenlynch commented 3 years ago

I'm marking this ready for review, even though there are a few elements still missing: I still need to add documentation and see what else I can test. I personally think we should deprecate VideoEncoder and the old encoding workflow in favor of the new workflow that muxes packets as they are generated. I'll add those deprecations to this PR unless there are objections?

IanButterworth commented 3 years ago

I think depreciation is the right strategy, but I think we need to do performance comparison to check there isn't a regression.

I wonder if this new encode approach slows down the frame loop, but saves the total time due to the removal of the final mux step?

galenlynch commented 3 years ago

Maybe, I can do some quick checks. Just from my experience with it it doesn't feel noticeably slower.

On Mon, Dec 7, 2020, at 08:58, Ian Butterworth wrote:

I think depreciation is the right strategy, but I think we need to do performance comparison to check there isn't a regression.

I wonder if this née encode approach slows down the frame loop, but saves the total time due to the removal of the final mux step?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaIO/VideoIO.jl/pull/279#issuecomment-739934739, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUDZT63FFHCXWI46E6V6S3STTNRRANCNFSM4T62LHAA.

galenlynch commented 3 years ago

Also the old workflow drops frames, unlike the new one, so speed isn't the only criteria.

On Mon, Dec 7, 2020, at 09:31, Galen Lynch wrote:

Maybe, I can do some quick checks. Just from my experience with it it doesn't feel noticeably slower.

On Mon, Dec 7, 2020, at 08:58, Ian Butterworth wrote:

I think depreciation is the right strategy, but I think we need to do performance comparison to check there isn't a regression.

I wonder if this née encode approach slows down the frame loop, but saves the total time due to the removal of the final mux step?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaIO/VideoIO.jl/pull/279#issuecomment-739934739, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUDZT63FFHCXWI46E6V6S3STTNRRANCNFSM4T62LHAA.

IanButterworth commented 3 years ago

Absolutely. It's the way we should go. It'd just be good to know whether there is a regression

galenlynch commented 3 years ago

For VideoEncoder, here is the test script that I ran on both master and this PR.

using FixedPointNumbers, ColorTypes, VideoIO, BenchmarkTools, Random
vidname = "testvideo.mp4"; nf = 100; nw = 100; nh = 100; el = RGB{N0f8};
Random.seed!(0);
img_stack = collect((rand(el, nh, nw) for i in  1:nf));
codec_settings = [:priv_data => ("crf"=>"22","preset"=>"medium")];
@benchmark encodevideo(vidname, img_stack, AVCodecContextProperties = codec_settings)

VideoEncoder on master:

BenchmarkTools.Trial: 
  memory estimate:  37.90 MiB
  allocs estimate:  3494
  --------------
  minimum time:     200.146 ms (0.00% GC)
  median time:      208.536 ms (0.30% GC)
  mean time:        209.727 ms (0.27% GC)
  maximum time:     225.657 ms (0.29% GC)
  --------------
  samples:          24
  evals/sample:     1

VideoEncoder on this PR (which uses sws_scale):

BenchmarkTools.Trial: 
  memory estimate:  175.63 KiB
  allocs estimate:  820
  --------------
  minimum time:     299.289 ms (0.00% GC)
  median time:      321.307 ms (0.00% GC)
  mean time:        321.601 ms (0.00% GC)
  maximum time:     355.430 ms (0.00% GC)
  --------------
  samples:          16
  evals/sample:     1

script to benchmark VideoWriter:

using FixedPointNumbers, ColorTypes, VideoIO, BenchmarkTools, Random
vidname = "testvideo.mp4"; nf = 100; nw = 100; nh = 100; el = RGB{N0f8};
Random.seed!(0);
img_stack = collect((rand(el, nh, nw) for i in  1:nf));
codec_settings = (crf = 23, preset = "medium");
@benchmark encode_mux_video(vidname, img_stack, encoder_private_settings = codec_settings)

VideoWriter on this PR (also uses sws_scale):

BenchmarkTools.Trial: 
  memory estimate:  1.59 KiB
  allocs estimate:  55
  --------------
  minimum time:     278.220 ms (0.00% GC)
  median time:      289.572 ms (0.00% GC)
  mean time:        291.087 ms (0.00% GC)
  maximum time:     309.156 ms (0.00% GC)
  --------------
  samples:          18
  evals/sample:     1

So it is slower, most likely due to sws_scale being used to handle color space transformations. However, this allows the encoding pixel format to be more flexible (e.g. you can now do yuv444p encoding), and seems to be more accurate (see #284), and also allows the target and source color spaces to be specified. Moreover, VideoWriter produces a video with all 100 input frames, while VideoEncoder results in only 98 (two dropped).

EDIT: I just noticed that the output of VideoWriter and VideoEncoder for this PR is yuv444p, making the tests above not directly comparable. Here is the updated script and result for VideoWriter and VideoEncoder when generating yuv420p videos on this PR (as VideoEncoder on master normally does):

... # other code the same
@benchmark encode_mux_video(vidname, img_stack, encoder_private_settings = codec_settings, target_pix_fmt = VideoIO.AV_PIX_FMT_YUV420P) # forced pix_fmt
BenchmarkTools.Trial: 
  memory estimate:  1.70 KiB
  allocs estimate:  58
  --------------
  minimum time:     163.300 ms (0.00% GC)
  median time:      181.872 ms (0.00% GC)
  mean time:        182.234 ms (0.00% GC)
  maximum time:     196.174 ms (0.00% GC)
  --------------
  samples:          28
  evals/sample:     1

And again for VideoEncoder on this PR:

@benchmark encodevideo(vidname, img_stack, AVCodecContextProperties = codec_settings, target_pix_fmt = VideoIO.AV_PIX_FMT_YUV420P)
BenchmarkTools.Trial: 
  memory estimate:  303.33 KiB
  allocs estimate:  809
  --------------
  minimum time:     184.247 ms (0.00% GC)
  median time:      208.980 ms (0.00% GC)
  mean time:        208.665 ms (0.00% GC)
  maximum time:     225.981 ms (0.00% GC)
  --------------
  samples:          24
  evals/sample:     1

the takeaway for me is that simultaneously muxing and encoding is faster than doing the encoding separately from the muxing. There is no performance regression in this PR compared to master, and VideoWriter seems to be faster. I could look at "per-frame" encoding writing time but I'm not sure how informative that would be to real-world workflows. Happy to do so if you're interested, though.

galenlynch commented 3 years ago

If you instead use the "scanline_major" interface that I've added in this PR, it's basically the same run time:

 @benchmark encode_mux_video(vidname, img_stack, encoder_private_settings = codec_settings, target_pix_fmt = VideoIO.AV_PIX_FMT_YUV420P, scanline_major = true)
BenchmarkTools.Trial: 
  memory estimate:  1.70 KiB
  allocs estimate:  58
  --------------
  minimum time:     167.531 ms (0.00% GC)
  median time:      180.941 ms (0.00% GC)
  mean time:        182.309 ms (0.00% GC)
  maximum time:     198.730 ms (0.00% GC)
  --------------
  samples:          28
  evals/sample:     1

IanButterworth commented 3 years ago

Great. So I'm ignoring the 2nd and 3rd results above, as those are yuv444p, right? So indeed, seems like a speed up on like-for-like output format. Great news

Would the thing to do be to start another branch off this branch that removes all the old redundant functionality, and merge that to here if everyone agrees that we're in good-enough shape to do so? That way we don't muddy up this PR if we end up having to keep the old functions

galenlynch commented 3 years ago

Yeah that's right.

Starting a new branch to remove the old functionality seems like a good idea. I just realized that one hurdle to the long-term integration of this PR is that the new encoding workflow uses different function names, e.g. encode_mux_video instead of encodevideo. The only reason I did this was to have access to the new workflow, without removing the old workflow. If the old work flow will be replaced by the new one, then maybe we should decide whether we want to rename the functions or instead keep the names but change what the functions do. It seems like the path of least breakage would be to maintain the function names and change what they do. However, some of the new function names are more apt for the new functionality, e.g. prepareencoder is replaced by open_video_out in the new workflow, as it does more than just prepare the encoder.

galenlynch commented 3 years ago

I don't understand what's causing the Windows build to fail. I've tried to make sure that all files are closed by the time the test video file is removed, and I've moved removing the test file to the very end of the test script.

I'll try to get my windows partition set up to run tests so I don't have to push a new commit and wait for CI every time I want to try a new fix.

galenlynch commented 3 years ago

... unfortunately all tests pass when I try to run them locally on my Windows partition. The "problem" seems to be specific to Github's Windows CI. Though perhaps there is some difference between my Windows Julia configuration and Github's Windows Julia configuration that I am not accounting for, e.g. in the number of threads being used.

galenlynch commented 3 years ago

Never mind, didn't check out the right branch 😅

Getting the error locally now, which means I can begin to debug it.

IanButterworth commented 3 years ago

I noticed the other day that in Base the temp file/dir creator functions specifically call GC.gc() on windows before tidying up the files. That might be the issue?

So either move to only using the mktemp functions or add a gc() before any rm on windows

galenlynch commented 3 years ago

Yeah the call to gc() was the issue, though I only saw your comment after I realized that. I was relying on finalizers to close the AVFormatContext used by AVInput, even if the user calls close on it. I have just changed the close method so that it will close the AVFormatContext without waiting for the associated finalizer.

galenlynch commented 3 years ago

Looks like appveyor and drone tests are failing, but all of the github actions CI tests are now passing.

galenlynch commented 3 years ago

Reading from IOStreams seems broken on this PR. I will see if this is also the case on master. It doesn't seem like this functionality was ever covered by a test.

galenlynch commented 3 years ago

It seems like this PR and master have the same pattern of failure when trying to use VideoReader on an IO object: if the IO object points to a .ogg file it works, if it instead points to a .mp4 then it errors. Since this is not breakage introduced by this PR I'm going to ignore it.

galenlynch commented 3 years ago

I think this is basically complete, except for changing package level documentation. Some questions, such as whether the new workflow will use the existing function names, or instead add new names as is done in this PR to avoid shadowing the existing workflow, need to be resolved before this is ready to be merged. However, I think that's more a question for @ianshmean and other package maintainers, than for me.

I have added some documentation to the new, and old, functionality.

IanButterworth commented 3 years ago

Thanks for the continued work on this @galenlynch We should make a plan to draw the line in the sand and merge this and start a NEWS.md to give upgrade guidance for breaking changes, then get a new breaking change out. Then after a little time in the wild, release 1.0

Before we merge:

Are there any changes you think we should make to master before this PR, to release as a patch update etc. ?
Is there anything else you wanted to get in here?
We should delete the Travis and and perhaps appveyor CI yml's given we cover windows on github now
Drone CI: The test failures may be real? Could be updated to 1.5?

galenlynch commented 3 years ago

No problem. At this point I'm just trying to remove any bugs that I find as I use this PR, which I've been using for all of my decoding and encoding needs.

We still need to decide whether we will change the names of functions used to decode and encode videos, or instead change how they are used.

Are there any changes you think we should make to master before this PR, to release as a patch update etc. ?

I could take a look at splitting off non-breaking patches, but it would probably take a bit of effort giving how much of the plumbing has changed.

Is there anything else you wanted to get in here?

Not on my end. I would be happy to throw some stuff in if you have suggestions, or at least try to. It might not be too hard to do while everything's still "fresh" in my mind. However I'm not sure if I could devote a lot of time to it at the moment.

We should delete the Travis and and perhaps appveyor CI yml's given we cover windows on github now

Sounds good

Drone CI: The test failures may be real? Could be updated to 1.5?

I haven't really been paying attention to this since it doesn't seem to be correlated with whether tests pass or fail for me locally, and I don't understand it. Is this just testing on ARM? I'm confused why method errors would be present on one platform and not the others. I could look into it though.

IanButterworth commented 3 years ago

Great

I'm confused why method errors would be present on one platform and not the others

First thing I'd suggest is bumping drone to 1.5. If issues remain, I can help look into it.

We still need to decide whether we will change the names of functions used to decode and encode videos, or instead change how they are used.

I'd say go with your new names and depreciate the old ones.

I could take a look at splitting off non-breaking patches, but it would probably take a bit of effort giving how much of the plumbing has changed.

Let's not worry. Onward and upward

How's this for a plan.. 1) We fix CI then merge this 2) We prepare a PR to depreciate old functions (with proper depreciation warnings/helpers) & update docs 3) Release a breaking change version (v0.9.0) and see how people get on 4) Aim for v1.0 within a month or so

IanButterworth commented 3 years ago

@galenlynch It'd be good to move this forward.

Can you review and merge this into your branch, to tidy up the CI failures?

galenlynch commented 3 years ago

Yeah sorry that I haven't made much forward progress recently: had a few looming deadlines. I should be more free after next week, though I might also be able to do some work this weekend.

IanButterworth commented 3 years ago

@galenlynch how are things looking?

Shall we merge and move on to the deprecations and updating docs?

galenlynch commented 3 years ago

Sounds good to me! Was just about to check in to see what the status of this was.

IanButterworth commented 3 years ago

I was just about to merge, but wondered what kind of merge we should do here. I usually squash, but is there reason to keep the commits separate?

@galenlynch if I don't hear back in a few hours I'll squash merge

IanButterworth commented 3 years ago

Exciting! So next steps:

We prepare a PR to depreciate old functions (with proper depreciation warnings/helpers) & update docs
Release a breaking change version (v0.9.0) and see how people get on
Aim for v1.0 within a month or so

IanButterworth commented 3 years ago

Perhaps we could also work out FileIO integration for the 0.9 release

yakir12 commented 3 years ago

Super exciting!!! Thanks for this!!!

IanButterworth commented 3 years ago

@galenlynch I don't think the changelog does everything in this PR justice. Could you add a few brief notes to the changelog, perhaps linking to the appropriate PR (most of which will be this one)

galenlynch commented 3 years ago

Sure

IanButterworth commented 3 years ago

@galenlynch sorry to pester, but when do you think you could update the changelog? I'm hoping to not have v0.9 linger any longer

galenlynch commented 3 years ago

Yeah sorry I can do it this weekend?

IanButterworth commented 3 years ago

That'd be great, thanks

JuliaIO / VideoIO.jl