dec05eba / gpu-screen-recorder-issues

GPU Screen Recorder issue tracker
11 stars 0 forks source link

[BUG] Audio desyncs from video #7

Closed Stoppedpuma closed 3 months ago

Stoppedpuma commented 3 months ago

Describe the bug Audio will eventually desync from the video after a while

To Reproduce gpu-screen-recorder -w screen -f 120 -a "$(pactl get-default-sink).monitor" -cr full -fm cfr -ac opus -k hevc -q very_high -o ~/videos/video.mkv

Expected behavior Video and audio to remain in sync

Videos Start of the recording:

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/58333920/f3d1af9e-aecd-42e6-bfb4-96aa953eb1b0

~2 hours into the recording:

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/58333920/219ac9a7-fd75-4133-b1d5-a965df1c6376

Desktop (please complete the following information):

I can provide more examples than the ones posted if required, it helps to watch the input button on the bottom right to see the difference in latency better, using a waveform makes this extremely obvious.

I'm not running any audio processing software. I'm not able to reproduce this behaviour with obs. I'm able to reproduce this on every game I've tried, the game in the videos is osu!

dec05eba commented 3 months ago

I have an idea what the issue is when it comes to audio/video desync that isn't too out of sync like this and a very old version of gpu screen recorder likely doesn't have this issue, but reverting back to that behavior breaks opus/flac unfortunately so I'll have to find another way to fix it.

YozoraWolf commented 3 months ago

Adding my 25 cents here.

I've discovered the desync problem worsens for me while using my Bluetooth speaker (Echo Dot) (tons of desync and speed up)

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/5296711/f4083737-a1f9-4d2c-a053-72aa6f07043e


But then when using just the Built-in speakers: (Almost no desync, except at the end)

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/5296711/0932cec0-acd6-4347-9469-05ab7d056215

dec05eba commented 3 months ago

You are recording bluetooth in the second one as well. Do you mean that you record from it but dont play audio to it? what command do you use in the first one? it's also a different issue if you are recording from multiple audio sources at once

dec05eba commented 3 months ago

@Stoppedpuma im not sure if this is a case of audio desync or audio latency. I personally see the same issues in all screen recorders (tested with gpu screen recorder, simple screen recorder and obs studio) when using a bluetooth output device. But I test fix for this (its in the git repository and on aur as well). Can you do a git pull & install and retry? thanks

@YozoraWolf Can you try using a test version to see if there is a difference? run

flatpak uninstall com.dec05eba.gpu_screen_recorder
flatpak install --system https://dl.flathub.org/build-repo/96540/com.dec05eba.gpu_screen_recorder.flatpakref

and then try recording again. Thanks

YozoraWolf commented 3 months ago

@Stoppedpuma im not sure if this is a case of audio desync or audio latency. I personally see the same issues in all screen recorders (tested with gpu screen recorder, simple screen recorder and obs studio) when using a bluetooth output device. But I test fix for this (its in the git repository and on aur as well). Can you do a git pull & install and retry? thanks

@YozoraWolf Can you try using a test version to see if there is a difference? run

flatpak uninstall com.dec05eba.gpu_screen_recorder
flatpak install --system https://dl.flathub.org/build-repo/96540/com.dec05eba.gpu_screen_recorder.flatpakref

and then try recording again. Thanks

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/5296711/ad5541fd-19a6-4fe1-857b-80b18c92bf95

Negative. It seems sync at the end, so maybe it's skipping certain segments? (It's still choppy). However, mpv did output something rather interesting to note:

wolf@wolf-mint:~/Videos$ mpv --no-config Replay_2024-04-11_11-29-15.mp4 
 (+) Video --vid=1 (*) (hevc 1920x1080 59.999fps)
 (+) Audio --aid=1 (*) (aac 2ch 48000Hz)
AO: [pulse] 48000Hz stereo 2ch float
VO: [gpu] 1920x1080 yuv420p
Invalid audio PTS: 1.022333 -> 1.504208
Invalid audio PTS: 1.525542 -> 1.991375
AV: 00:00:16 / 00:00:56 (30%) A-V:  0.000
Invalid audio PTS: 16.878625 -> 17.354333
Invalid audio PTS: 17.375667 -> 17.480250
AV: 00:00:34 / 00:00:56 (60%) A-V:  0.000
Invalid audio PTS: 33.896500 -> 34.378417
Invalid audio PTS: 34.399750 -> 34.877708
Invalid audio PTS: 34.899042 -> 35.376687
Invalid audio PTS: 35.398021 -> 35.879833
Invalid audio PTS: 35.901167 -> 36.378687
Invalid audio PTS: 36.400021 -> 36.877562
Invalid audio PTS: 36.898896 -> 37.380396
Invalid audio PTS: 37.401729 -> 37.880042
Invalid audio PTS: 37.901375 -> 38.383229
Invalid audio PTS: 38.404562 -> 38.882125
AV: 00:00:34 / 00:00:56 (60%) A-V:  0.000
Invalid audio PTS: 33.896500 -> 34.378417
Invalid audio PTS: 34.399750 -> 34.877708
Invalid audio PTS: 34.899042 -> 35.376687
Invalid audio PTS: 35.398021 -> 35.879833
AV: 00:00:34 / 00:00:56 (60%) A-V:  0.000
Invalid audio PTS: 35.901167 -> 36.378687
Invalid audio PTS: 36.400021 -> 36.877562
Invalid audio PTS: 36.898896 -> 37.380396
Invalid audio PTS: 37.401729 -> 37.880042
Invalid audio PTS: 37.901375 -> 38.383229
Invalid audio PTS: 38.404562 -> 38.882125
AV: 00:00:34 / 00:00:56 (61%) A-V:  0.000
Invalid audio PTS: 33.896500 -> 34.378417
Invalid audio PTS: 34.399750 -> 34.877708
Invalid audio PTS: 34.899042 -> 35.376687
Invalid audio PTS: 35.398021 -> 35.879833
Invalid audio PTS: 35.901167 -> 36.378687
Invalid audio PTS: 36.400021 -> 36.877562
Invalid audio PTS: 36.898896 -> 37.380396
Invalid audio PTS: 37.401729 -> 37.880042
Invalid audio PTS: 37.901375 -> 38.383229
Invalid audio PTS: 38.404562 -> 38.882125
AV: 00:00:41 / 00:00:56 (74%) A-V:  0.000
Invalid audio PTS: 39.905417 -> 40.382979
Invalid audio PTS: 40.404312 -> 40.881979
Invalid audio PTS: 40.903312 -> 41.384292
Invalid audio PTS: 41.405625 -> 41.883146
Invalid audio PTS: 41.904479 -> 42.138646
AV: 00:00:52 / 00:00:56 (92%) A-V:  0.000
Invalid audio PTS: 51.660687 -> 52.137021
Invalid audio PTS: 52.158354 -> 52.636250
Invalid audio PTS: 52.657583 -> 53.135813
Invalid audio PTS: 53.157146 -> 53.639458
Invalid audio PTS: 53.660792 -> 54.138354
Invalid audio PTS: 54.159687 -> 54.637313
Invalid audio PTS: 54.658646 -> 55.140792
Invalid audio PTS: 55.162125 -> 55.639667
Invalid audio PTS: 55.661000 -> 56.139708
Invalid audio PTS: 56.161042 -> 56.643896
AV: 00:00:53 / 00:00:56 (94%) A-V:  0.000

Ran using a custom script I made: (I haven't updated the audio codec that you disabled, but it's using aac anyways)

wolf@wolf-mint:~/.gsr$ ./gsr-start-replay.sh 
Merged audio sources: alsa_output.pci-0000_06_00.6.HiFi__hw_Generic__sink.monitor|bluez_sink.C0_91_B9_DE_7C_AE.a2dp_sink.monitor
Starting recording with the following arguments: -w screen -fm vfr -f 60 -a alsa_output.pci-0000_06_00.6.HiFi__hw_Generic__sink.monitor|bluez_sink.C0_91_B9_DE_7C_AE.a2dp_sink.monitor -ac opus -k hevc -c mp4 -r 60 -o /home/wolf/Videos
Warning: opus and flac audio codecs has been temporary disabled, using aac audio codec instead
[hevc_nvenc @ 0x55ae58891780] ignoring invalid SAR: 0/0
[hevc_nvenc @ 0x55ae58891780] ignoring invalid SAR: 0/0
update fps: 246
update fps: 246
update fps: 239
update fps: 243
dec05eba commented 3 months ago

I cant see those outputs when running with mpv. Did you re-encode the file? that makes it harder to debug it. Can you upload the raw video somewhere? also can you try running gpu screen recorder with only one audio input, not merging audio

YozoraWolf commented 3 months ago

I cant see those outputs when running with mpv. Did you re-encode the file? that makes it harder to debug it. Can you upload the raw video somewhere? also can you try running gpu screen recorder with only one audio input, not merging audio

Yeah this one is re-encoded. I'll upload a raw one soon.

I will make sure to include just one audio sink.

YozoraWolf commented 3 months ago

First test ran with Bluetooth Speaker:

wolf@wolf-mint:~/Videos/wolf@wolf-mint:\~_.gsr$ flatpak run --command=gpu-screen-recorder com.dec05eba.gpu_screen_recorder -w screen -fm vfr -f 60 -a bluez_sink.C0_91_B9_DE_7C_AE.a2dp_sink.monitor -ac aac -k hevc -c mp4 -r 60 -o /home/wolf/Videos

(Had to upload to discord since it exceeds github's 10MB limit)

Then there's the second test ran with the Built-in Speakers (no audio for some reason although I set Built-in Speakers as default)

flatpak run --command=gpu-screen-recorder com.dec05eba.gpu_screen_recorder -w screen -fm cfr -f 60 -a alsa_output.pci-0000_06_00.6.HiFi__hw_Generic__sink.monitor -ac aac -k hevc -c mp4 -r 60 -o /home/wolf/Videos

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/5296711/fb866950-14ac-4b50-8629-9788cfac708c

dec05eba commented 3 months ago

You used replay in this example but does the same issue happen if you record a regular video? Also this might be an issue that happens on pulseaudio but not pipewire. Linux mint 21.3 might still use pulseaudio unlike many other distros so i'll if the issue is pulseaudio specific.

dec05eba commented 3 months ago

@YozoraWolf Ok I was able to reproduce the issue you have on linux mint but not arch linux. I'll investigate why it doesn't work on mint.

YozoraWolf commented 3 months ago

You used replay in this example but does the same issue happen if you record a regular video? Also this might be an issue that happens on pulseaudio but not pipewire. Linux mint 21.3 might still use pulseaudio unlike many other distros so i'll if the issue is pulseaudio specific.

I once installed pipewire, but it spawned more issues than it solved. But that's a very interesting note, perhaps it might be that, I've never tried with pulseaudio before though.

YozoraWolf commented 3 months ago

@YozoraWolf Ok I was able to reproduce the issue you have on linux mint but not arch linux. I'll investigate why it doesn't work on mint.

Maybe we're on to something then with pulseaudio.

Stoppedpuma commented 3 months ago

I ran a test which lasts 2 hours and 36 minutes, the desync seems to have been fixed? I might be completely wrong but it does seem that the audio latency is higher though, in the video I recorded that I used to report this issue I used a waveform in kdenlive to manually verify that I wasn't just hearing things incorrectly. The old video took 4 moves on the waveform to hear the hitsounds audio after the key indicator lit, after pulling the latest commit it now takes 10 moves on the waveform. I'm also able to tell when watching the video in mpv.

Screenshot of the waveform incase you have no idea what I'm talking about when I say "waveform":

1712853705

dec05eba commented 3 months ago

that might be a random thing, latency for audio can change by itself. But it would be better to create a separate issue for it as it's a different category of issue. I pushed another change that might affect that or not (it does on linux mint, I dont know about debian).

@YozoraWolf I fixed that particular issue on linux mint now, but flathub is being very slow. I'll give a link with the flatpak build once its done.

I also noticed that pulseaudio has much lower latency than pipewire (at least as configured on linux mint vs arch linux pipewire) when using bluetooth output device (latency for any recording program, not just gpu screen recorder).

YozoraWolf commented 3 months ago

that might be a random thing, latency for audio can change by itself. But it would be better to create a separate issue for it as it's a different category of issue. I pushed another change that might affect that or not (it does on linux mint, I dont know about debian).

@YozoraWolf I fixed that particular issue on linux mint now, but flathub is being very slow. I'll give a link with the flatpak build once its done.

I also noticed that pulseaudio has much lower latency than pipewire (at least as configured on linux mint vs arch linux pipewire) when using bluetooth output device (latency for any recording program, not just gpu screen recorder).

Thanks, then you were right on pulseaudio. It has been said on some articles that pipewire is supposed to be a successor to pulse, though I'm not sure why Mint still keeps it. It was a pain for me to install it so I had to revert back to pulse on mint.

Thanks, I'll be on the look out.

Stoppedpuma commented 3 months ago

I'm going to run one last test overnight to see if I can reproduce this again, if I can't I'll close the issue.

dec05eba commented 3 months ago

I'm going to run one last test overnight to see if I can reproduce this again, if I can't I'll close the issue.

Thanks. Make sure you have the latest version as I pushed a change around 2 hours ago

dec05eba commented 3 months ago

@YozoraWolf The flatpak test build is done, can you run this:

flatpak uninstall com.dec05eba.gpu_screen_recorder
flatpak install --system https://dl.flathub.org/build-repo/96733/com.dec05eba.gpu_screen_recorder.flatpakref

and test if the audio is fine now? thanks

YozoraWolf commented 3 months ago

@YozoraWolf The flatpak test build is done, can you run this:

flatpak uninstall com.dec05eba.gpu_screen_recorder
flatpak install --system https://dl.flathub.org/build-repo/96733/com.dec05eba.gpu_screen_recorder.flatpakref

and test if the audio is fine now? thanks

That's it! It works both on the Built-in speakers and Bluetooth one. You can check the first frame of every video and see the output too, but I've just alternated between sinks.

Bluetooth Speaker:

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/5296711/de20c942-16f7-4d2e-bc83-9c8617d968e1

Built-in Speakers:

https://github.com/dec05eba/gpu-screen-recorder-issues/assets/5296711/866168e3-f544-47b6-a3c0-9cf255c94944

Also DO excuse the lag spikes (or choppiness) on the first video. I'm streaming using Moonlight (with WIF6) so there are times where the connection might chop a little, it has nothing to do with gsr, it's more my connection.

There is no desync so far nor does it sound choppy.

Thank you for taking a look at it @dec05eba . The issue from my side seems to be solved using this latest flatpak you uploaded.

Note: RAW uploads cannot be viewed directly on github (at least not for me) perhaps due to the audio encoding being aac but I believe it might work under opus (disabled currently? don't know if flatpak does too)

dec05eba commented 3 months ago

Ok, thanks for testing :). The update should be on flathub after a few hours when flathub has published the update.

Stoppedpuma commented 3 months ago

I've went through the video I recorded which was around 9 hours, the desync seems to be fixed!

Regarding audio quality:

The audio quality of AAC while not horrible is still a bit lacking and has very small occasional artifacts? I believe bumping the bitrate for AAC to a value around 160 would be very beneficial, this would still leave Opus as the most efficient codec even if it was bumped up from 96kbps to 128kbps. 128kbps for Opus seems to be the sweet spot of file size to audio transparency, an option like -aq (audio quality) or -ab (audio bitrate) would be nice if you do ever decide to start accepting new features.

Thanks for the quick fixes!

dec05eba commented 3 months ago

Thanks for testing! as for the bitrate, it's not something I notice myself so I didn't know that. But I added an -ab option now to set bitrate. I haven't updated it in the gui yet. I dont know if I'll keep or if i'll just default to 160kbps. But that would be something I do after i make opus work again.

Stoppedpuma commented 3 months ago

@dec05eba Sorry to bother you about a feature you probably never wanted to add but what is the reason you would rather remove the option for the user to set the audio bitrate? In my case I would highly prefer to be able to set the bitrate using the -ab option to be higher / lower depending on what I'm recording.

dec05eba commented 3 months ago

@Stoppedpuma I meant if there aren't any downsides to keeping it at 160kbps then I dont see any reason to intentionally make the quality low. If the audio filesize is still relatively small compared to the video filesize. But that's not something I have looked at. If the filesize is noticable larger because of the higher quality then keeping the -ab option would make sense. Or if you know any other reason to intentionally make it worse.

Stoppedpuma commented 3 months ago

1713015906 (Values come from a 13 minute 275MB wave file) File size wise there isn't a big difference at all for the way better quality. If you wanted as close to transparency as possible while being reasonable with file size then probably 320kbps aac and 224kbps opus but these values are overkill majority of the time (especially in software tailored to gamers? I don't quite know the target audience.)

I agree with your reasoning, there shouldn't be a downside to increasing these values besides the small difference in size. It doesn't really make sense to keep the -ab option around unless you find that any of the reasons I list below seems like a concern, they are small but just a list of things I've encountered before:

dec05eba commented 2 months ago

I re-enabled opus and made it default. I dont know if it will re-introduce this issue but if it does re-open this.

Stoppedpuma commented 2 months ago

I re-enabled opus and made it default. I dont know if it will re-introduce this issue but if it does re-open this.

Sorry for the late response, I'll start a test now and get back to you in 2-3 hours with results.

Tests will be ran on commit 16d273e (latest)

Stoppedpuma commented 2 months ago

Seems to work fine, no desync in a 3 hour video.

dec05eba commented 2 months ago

Seems to work fine, no desync in a 3 hour video.

Thanks a lot for testing!

Stoppedpuma commented 2 months ago

If you think your upcoming fix for flac might cause this issue again as well then let me know and I'll test it as well.

I do have a question that may help you with getting region / window capture working on wayland, have you thought of using something like slurp? It would require implementation of screen coordinates but might be something worth looking at if you haven't already?

dec05eba commented 2 months ago

If you think your upcoming fix for flac might cause this issue again as well then let me know and I'll test it as well.

Thanks!

I do have a question that may help you with getting region / window capture working on wayland, have you thought of using something like slurp? It would require implementation of screen coordinates but might be something worth looking at if you haven't already?

slurp only works on wlroots based wayland compositors

dec05eba commented 2 months ago

But in the future i believe there will be an ugly way to do it in wayland. You can put a fullscreen window with transparent background on every monitor in which case you can capture mouse events on wayland to make a slurp alternative that works on every wayland compositor.

Stoppedpuma commented 2 months ago

Just an additional heads-up, I looked at the video I recorded to test this again and I just noticed that the audio is very slightly ahead (1 tick at 120fps on kdenlive waveform), it's like this through the whole video and not caused by desync, I'm assuming this would be caused from this?

- const double audio_startup_time_seconds = 0.080833; + const double audio_startup_time_seconds = std::max(0.0, 0.089166 - target_fps);

It's so small that it might just be a flaw in my testing:

1715474412

dec05eba commented 2 months ago

it's possible yes, but also that code you posted there is outdated

Stoppedpuma commented 2 months ago

After some more testing I can confirm does seem to be ahead, the weird part is that it varies? I was thinking this might be because of frame duplication or something but I'm not seeing any in either MPV or Kdenlive? I'm measuring anywhere between 1-5 ticks ahead of the video at 165 fps, 5 seems to be pretty uncommon while 1 seems to be the most common. This behaviour is shown in MPV as well.

dec05eba commented 2 months ago

What command are you using to run gpu screen recorder and what type of audio device is it?

dec05eba commented 2 months ago

Hmm mpv doesn't play audio properly in sync with videos when doing frame by frame and kdenlives audio frames doesn't seem to be fully accurate either. Which song are you using with osu? maybe it's more accurate and has consistent frames. Also I pushed a change to git, but I dont know if it's accurate now that video players that I test with are not accurate either.

Stoppedpuma commented 2 months ago

Yeah MPV seems to have very mixed results when going frame by frame, I noticed that around half way through my testing videos it would go from late to early on frame by frame but would always be early at 0.26 speed.

Below is some of what I've gathered from my testing on commit 16d273e, I'll test the latest commit soon and leave another reply about it soon.

Which song are you using with osu

I've tested with a set of around 20 popular maps as well as 5 different offset wizard style maps including the one made by the creator of osu! as well as some I've created. I've possibly ruled out osu! as a way of testing for reasons stated below

I've also tested with other methods such as aligning colour frames with audio in videos, re-recording a videos playback multiple times, using applications with different audio frameworks (SDL, BASS, Pure ALSA, etc),

Possible reasons of why this might happen:

As mentioned by you, Kdenlive has issues aligning audio frames correctly and MPV can't be trusted in frame by frame mode, this issue could have originally been started because I was using kdenlive to determine how early the audio was. I originally noticed this in MPV when early became late in frame by frame mode but was using Kdenlives waveform.

Audio driver buffer configs: At some point I took basically everything from my setup off the table and ran a stock fedora install to see if it was an issue with my setup, how I determined this to possibly be a cause is because of how I usually run osu!. osu! uses an audio framework called BASS, BASS does not have support for linux audio servers besides ALSA, the only way to run osu! (at least lazer) with lower audio latency than 10ms is by either configuring ALSA or by configuring the ALSA part of your sound server, in my case it's by doing the following: PIPEWIRE_ALSA='{ alsa.buffer-bytes=512 alsa.period-bytes=32 }' /path/to/osu!. Usually when I do tests I stick to stock configurations to ensure it's not an issue on my end, I noticed while testing this that I would have different numbers in kdenlives waveform. This would explain why you originally weren't able to reproduce the latency since we are using different quantums on pipewire (Or the video you mentioned using being incorrect, because if it's that youtube one then yes that one is very off could have also been the reason).

Because of the above it does raise the question of is it possible to ignore the audio servers latency configurations?

Another issue I've encountered while running all these tests is that there's a small chance (around 1 in 50) that gsr will just hang when ^C-ing the program and killing it is the only way to get it to close.

What command are you using to run gpu screen recorder and what type of audio device is it?

gpu-screen-recorder -w screen -f 165 -a "$(pactl get-default-sink).monitor" -cr full -fm cfr -ac opus -ab 192000 -k hevc -q ultra -o ~/videos/video.mkv during this testing, tests were done using my headphones connected to the aux connector on my motherboard. I'm able to reproduce this behaviour with speakers connected to my motherboards optical port as well.

Stoppedpuma commented 2 months ago

Test on 0288b94 (latest):

It matches 1:1 audio and video in osu! when using PIPEWIRE_ALSA='{ alsa.buffer-bytes=512 alsa.period-bytes=32 }' /path/to/osu!

When it isn't using any audio configurations it's still ahead of the video

1715604763

I'm going to start expanding my software used to analyse these videos towards more professionally used software where these issues might not occur, most notable ones will be Reaper running on flatpak, Resolve 19 running on wine, and premier running on wine (If I can manage it since it has a bunch of issues, otherwise qemu). I'll start with the most obvious things like manually comparing audio times and frame times.

Update: Kdenlive seems to be the incorrect at least for the timeline (or I'm just dumb), 00:00:22:664 is where audio plays in Reaper which matches with video in MPV but in Kdenlive it plays at 00:00:22:652 at 1000fps. This is impossible because there is no audio in either Reaper or Audacity at this time.

crop

I couldn't tell you if it plays the audio at the exact same time in MPV since the frame by frame issue but it seems in time at 0.26 speed.

dec05eba commented 2 months ago

Ok thanks.. I'll keep it as it is now. It may not be 100% perfect but it seems like no software is 100% perfect in this manner either lol. I dont actually know if it can be done 100% perfectly and I guess its good enough for now.

dec05eba commented 2 months ago

Another issue I've encountered while running all these tests is that there's a small chance (around 1 in 50) that gsr will just hang when ^C-ing the program and killing it is the only way to get it to close.

It might be an AMD bug as I have only seen it on AMD I think. Nvidia had a bug like that in cuda that I managed to workaround. On Nvidia it only happened on some nvidia cards on certain driver versions, which might be the same on AMD. But the only way to fix it on AMD would be to run gdb while it happens and then do ctrl+c to see where it freezes, but since it doesn't happen that often its annoying to do. Or run gpu screen recorder with debug symbols enabled in release build and when it freezes send a signal with kill that causes a core dump (such as SIGSEGV) to gpu screen recorder.

Stoppedpuma commented 2 months ago

Sent to your email to keep this issue at least somewhat on topic.