Allow rendering on multiple outputs with <output> = *

mstoeckl commented 1 year ago

Example use: run the following on either a single or multi-monitor setup; the video should show on every display:

mpvpaper '*' /path/to/video

Notes:

It looks like no extra changes will be needed for the --auto-stop feature, because it seems auto_stop is only triggered if no outputs have displayed a frame/set halt_info.frame_ready = 1 in the last two seconds; thus as long as one monitor is visible, mpvpaper will continue running.
I measured the efficiency improvement with intel_gpu_top and a video drawn simultaneously on three small virtual outputs: running mpvpaper '*' video.mkv used only 20% of the hardware video decoding capacity, and total GPU power draw was 3 W ; while showing the same video with three independent copies of mpvpaper used 40% of the hardware video decoding capacity and increased power draw to 4.5W total.

GhostNaN commented 1 year ago

Holy crap!!! I can't believe it, shows you how much I know about OpenGL (clueless).

You're going to have to give me a bit to sort through what you did. But what I can say right now, it does work.

GhostNaN commented 1 year ago

Alright, I've looked it over...

It doesn't effect current operation for single display. I didn't see any real difference in RAM usage and only a little in CPU. Also no increase in GPU usage? (More on that later) And the code looks reasonable. But....

video drawn simultaneously

was not the case for my system.

Yes, it showed the video on all 3 monitors, but ran terribly. I looked into it and found that the frametimes per output was like VIDEO_FPS / MONITOR_COUNT So if the video was running at 60(~16.6ms) each monitor was displaying at more like 20(~49.8ms).

It seemed like the outputs were taking turns rendering then displaying the video. I verified this effect by measuring the frame callback time and render time with this in frame_handle_done():

double old_tv;

static void frame_handle_done(void *data, struct wl_callback *callback, uint32_t frame_time) {
    wl_callback_destroy(callback);

    struct display_output *output = data;
    if (strcmp(output->name, "DP-2") == 0) {
        struct timeval tv;
        gettimeofday(&tv, NULL);
        double curr_tv = tv.tv_usec;
        printf("Frame Time: %fms  Output: %s\n", (curr_tv - old_tv) / 1000, output->name);
        old_tv  = curr_tv;
    }
...

One curiosity was that I didn't see any rise in hardware video decoding or overall usage on my RX 6700XT like you saw with yours. Nor did I see an increase in power usage from my GPU.

This was the issue I was trying to convey with monitors of varying resolutions. As the video is rendered with these parms:

mpv_render_param render_params[] = {
        {MPV_RENDER_PARAM_OPENGL_FBO, &(mpv_opengl_fbo){
            .fbo = 0,
            .w = output->width * output->scale,
            .h = output->height  * output->scale,
        }},
        // Flip rendering (needed due to flipped GL coordinate system).
        {MPV_RENDER_PARAM_FLIP_Y, &(int){1}},
    };

and can't be shared easily due to differences in width and height when rendered.

Perhaps there is a way around this issue as well. I'll try some workarounds myself and see if I find anything. But I applaud you for getting this far. It is as if we are 90% the way there, just falling short.

Again, excellent work otherwise.

mstoeckl commented 1 year ago

video drawn simultaneously was not the case for my system.

Yes, it showed the video on all 3 monitors, but ran terribly. I looked into it and found that the frametimes per output was like VIDEO_FPS / MONITOR_COUNT So if the video was running at 60(~16.6ms) each monitor was displaying at more like 20(~49.8ms).

I can reproduce this; the problem seems to be that mpv_render_context_render always blocks until a new frame is available (which generally takes 1,/VIDEO_FPS seconds). There appears to be an option (MPV_RENDER_PARAM_BLOCK_FOR_TARGET_TIME) to disable that delay (in exchange for worse synchronization between audio and video), but when ~I do that I get an as-yet unexplained crash in mpv~ Update: the following patch on top of this PR seems to work:

diff --git a/src/main.c b/src/main.c
index 05c61c0..11648ab 100644
--- a/src/main.c
+++ b/src/main.c
@@ -128,6 +128,8 @@ static void render(struct display_output *output) {
         }},
         // Flip rendering (needed due to flipped GL coordinate system).
         {MPV_RENDER_PARAM_FLIP_Y, &(int){1}},
+        {MPV_RENDER_PARAM_BLOCK_FOR_TARGET_TIME, &(int){0}},
+        {MPV_RENDER_PARAM_INVALID, NULL},
     };

     if (!eglMakeCurrent(egl_display, output->egl_surface, output->egl_surface, egl_context)) {

GhostNaN commented 1 year ago

You amaze me again! I thought I knew the problem. I was wrong.

I was about to go on a full on rant on about.. How if the mpv context is shared, how could they all play the same frame? And how can we just share a context between all the outputs? But none of that seemed necessary!

CPU usage was a bit more brutal and unusual.
With worse CPU usage scaling with VAAPI compared to 3 processes of mpvpaper. But better CPU usage scaling with software decode compared to 3 processes of mpvpaper.

At least RAM usage is A LOT better with just "*" option. Only consuming always just about 1 mpvpaper process worth of RAM

GPU usage and power was just more brutal for the most part.
Overall GPU usage seemed worse, but is harder to nail down here so I'll consider it a wash. GPU power scaling was unfortunately considerably worse compared to 3 processes of mpvpaper. Some good though was there was no change in HW decode usage.

Overall, a mixed bag as far as resource usage savings go. As it turns out, having NO block is also not great. Because the frame callback will just then callback every time the monitor refreshes. So if it's a 120hz panel, regardless if the video is 30 FPS, 60 FPS or whatever. The monitor will always refresh at effectively 120 FPS (if it can render fast enough). Effectively wasting resources re-rendering the same frame multiple times.

I have good news though, I believe this also can be fixed. I probably going to humbled again, but I'll say my thoughts and ideas. With the surface frame callback, the outputs share the SAME thread. So if 1 output blocks, all other outputs will never get the chance to callback and render. This was the issue with mpv_render_context_render().

The simplest solution for this, is to somehow limit/delay the output surface frame callback to the VIDEO_FPS. If that's not possible, then it would have to be delayed by smartly using some form of usleep(). The last option is to somehow leverage mpv_render_context_set_update_callback() or something similar to notify when to render the next frame for all outputs.

Sorry for the essay of information, I didn't want to leave anything out. Lost about half my day to this already, so I'll get back to this later.

mstoeckl commented 1 year ago

The simplest solution for this, is to somehow limit/delay the output surface frame callback to the VIDEO_FPS. If that's not possible, then it would have to be delayed by smartly using some form of usleep(). The last option is to somehow leverage mpv_render_context_set_update_callback() or something similar to notify when to render the next frame for all outputs.

The last option, mpv_render_context_set_update_callback, seems to be the recommended way to do it -- some of the mpv examples use it, see e.g. SDL demo. Using the callback with Wayland is definitely doable, although it will require maybe 50 extra lines of standard boilerplate code to get a main loop that can wait for events from both Wayland and mpv. I'll try to implement this when I next have time, possibly next weekend.

mstoeckl commented 1 year ago

I've updated the PR so that frame drawing is rate limited both by the wl_surface::frame callbacks and by frame update callbacks from mpv. Let me know if you find any other problems.

GhostNaN commented 1 year ago

After that last commit, I believe we are ready to rock. I wouldn't of even thought about using poll() and pipe(), good thinking.

CPU and GPU usage/power is now as good as running multiple instances of mpvpaper(mpvaper³). _{Although CPU usage with software decode is still better compared to mpvaper³}

GPU HW decode, VRAM, and RAM is without a doubt still better than mpvaper³

Just letting you know I plan on doing a bit of code cleanup and TLC after this pull. But nothing functionally you added will change. I'm just not going to bog you down any further with nit picks.

Thank you for such an awesome contribution! mpvpaper 1.3 is looking to be another great release!

GhostNaN / mpvpaper

Allow rendering on multiple outputs with <output> = * #26