fzwoch / obs-teleport

An OBS Studio plugin for an open NDI-like replacement. Pretty simple, straight forward. No NDI compatibility in any form.
GNU General Public License v2.0
424 stars 16 forks source link

Teleport lags out and never recovers #64

Closed Stevie-O closed 1 year ago

Stevie-O commented 1 year ago

Something happened over the holidays and my Teleport just... stopped working. I'm not sure whether it's caused by a rogue wireless device, or the latest Windows 11 update, or what, but I started using Teleport back in September and everything was fine until mid-December.

Unfortunately, since then I've tried updating everything (Windows, OBS, Teleport itself, etc.)

What I initially observed was this: Enabling Teleport would cause the screen to be transmitted, but then it would freeze. Disabling, then re-enabling would re-synchronize the current screen, but again, it would immediately freeze.

Eventually I noticed that the screen (which includes a timer) would actually update a few seconds later, with the new timer value, but then it still froze.

At present, I had to drop the quality settings from 90 to 50 and drop OBS's framerate from 60 to 10, and it still can't keep up. I ran a game for an hour and 45 minutes. When it was all over, the receiving side was showing frames from the 1:09 mark!

17:11:33.717: Output 'Teleport': stopping
17:11:33.717: Output 'Teleport': Total frames output: 564
17:11:33.717: Output 'Teleport': Total drawn frames: 565
17:11:59.478: 1 views remain at shutdown
17:11:59.478: ---------------------------------
17:11:59.478: video settings reset:
17:11:59.478:   base resolution:   642x601
17:11:59.478:   output resolution: 640x600
17:11:59.478:   downscale filter:  Bicubic
17:11:59.478:   fps:               10/1
17:11:59.478:   format:            NV12
17:11:59.478:   YUV mode:          Rec. 709/Partial
17:11:59.478: NV12 texture support not available
17:11:59.478: P010 texture support not available
17:11:59.495: Settings changed (video)
17:11:59.495: ------------------------------------------------
20:18:01.315: Output 'Teleport': stopping
20:18:01.315: Output 'Teleport': Total frames output: 111568
20:18:01.315: Output 'Teleport': Total drawn frames: 111526 (111569 attempted)
20:18:01.315: Output 'Teleport': Number of lagged frames due to rendering lag/stalls: 43 (0.0%)
21:14:47.044: ==== Shutting down ==================================================

Looks like I got things setup at 5, then stopped for dinner. Logs from the streaming side show:

Some observations here:

  1. "Number of lagged frames" is handled by OBS passing (or not) the frames into Teleport. https://github.com/obsproject/obs-studio/blob/9145e3063d741c87206da21831865e4ea00a5c85/libobs/obs-output.c#L319
  2. "Total frames output: 111568" with no "attempted" means that obs_output_get_frames_dropped returned 0. https://github.com/obsproject/obs-studio/blob/9145e3063d741c87206da21831865e4ea00a5c85/libobs/obs-output.c#L333
  3. For Teleport, obs_output_get_frames_dropped returns h.laggedFrames (note the difference: OBS calls them "dropped", the Teleport source calls them "lagged". it does not actually drop anything.)
  4. which is only incremented when the end of the queue is more than 1 second in the future: https://github.com/fzwoch/obs-teleport/blob/6b2f3426e1a685166836c11d61728c374d678917/output.go#L142
  5. which suggests that when line 142 executes, nothing in this queue is more than 1 second in the future

I'm not familiar with Go. I believe I have the gist of most of it, however, and something seems suspicious here.

If the limiting factor is CPU to encode the frames, this seems like it will probably work fine. But if the limiting factor is network bandwidth -- that is, if SenderSend takes too long -- then won't the held lock prevent the output_raw_video handler from acquiring the lock and enqueuing the next frame entirely? I have no idea what happens inside OBS if that callback is blocked.

fzwoch commented 1 year ago

On sender side audio and video is handled differently. Audio is not threaded so should be a bit easier. The Sending of video and audio is done over the same channel, but the sending should again happen in a go routine, so non-blocking for callbacks.

On receiver side audio and video get interleaved according to their timestamps before giving to OBS. This is done to archive some kind of audio sync which seems otherwise impossible with OBS. Here the queue can exceed, but that means audio and video timestamps have drift over 5 seconds apart. This really sound something has gone really wrong.

Of course if the sending cannot keep up. There may be weird things happening. This is not logged anywhere.

YorVeX commented 1 year ago

Just as a side-note, if you don't need audio on a source (but by default it's still processed by OBS and can cause buffering issues) this filter can be applied.

fzwoch commented 1 year ago

Just as a side-note, if you don't need audio on a source (but by default it's still processed by OBS and can cause buffering issues) this filter can be applied.

I think when you use the output module you will get audio nonetheless. It may be silent, but the audio mixer in OBS is always active.

fzwoch commented 1 year ago

So I can kind of trigger it when added a Sleep in the callback function. Seems like OBS queues these frames. But I do not see why this method would block, or why it would take a considerable amount of time, especially for the resolution and framerate in that code snippet.

Stevie-O commented 1 year ago

So I can kind of trigger it when added a Sleep in the callback function. Seems like OBS queues these frames. But I do not see why this method would block, or why it would take a considerable amount of time, especially for the resolution and framerate in that code snippet.

Well, one way to confirm it to be sure would be to change lines 160-161 from:

            h.SenderSend(h.queue[0].Buffer)
            h.queue = h.queue[1:]

to

            nextPacket := h.queue[0].Buffer
            h.queue = h.queue[1:]
            h.Unlock()
            h.SenderSend(nextPacket)
            h.Lock()

That would definitely decouple the lock taken by SenderSend (on the Sender object) from the lock taken by output_raw_video (on the teleportOutput object), and confirm or rule out anything like that.

A review of SenderSend suggests that it might be some issue on the receiving end, but I can't really tell without adding a lot more debugging statements. If you could provide detailed instructions on how I can compile Teleport myself, I can do a lot more experimentation.

fzwoch commented 1 year ago

The TL;DR is this:

export CGO_CFLAGS=$(pkg-config --cflags libobs libjpeg)
export CGO_LDFLAGS=$(pkg-config --libs libobs libjpeg)

go build -buildmode=c-shared -o obs-teleport.so

And it actually will want the frontend library from OBS as well. Not sure tbh why it works without it for me atm.

On Windows you hopefully have a decent shell with pkg-config giving you the required options for cgo, or else you have to set these values by hand.

And of course you will need GCC (unsure if clang support for windows is merged yet).

Stevie-O commented 1 year ago

Will it work if I build it from within WSL? What do you do to build the release DLLs?

fzwoch commented 1 year ago

Not really. WSL will create Linux binaries by default. Of course if you have installed a cross compiler and link to windows versions of libobs and libjpeg.. many things are possible, but not necessarily trivial.

I have some customized scripts that do that for me, but they are not portable to other systems without a lot of effort. So I'm afraid this is a bit of an adventure if you want to do it.

https://pkg.go.dev/cmd/cgo covers a bit at the start, but tbh it is not very good if you are new to that. Basically you need to know that tools you need, what -I, -l and -L options of the compiler do and set them to values where you stored the dependent libobs and libjpeg libraries.

Stevie-O commented 1 year ago
GOOS=windows go build -buildmode=c-shared -o obs-teleport.dll
# obs-teleport
./discoverer.go:36:58: undefined: Peer
./discoverer.go:57:38: undefined: Peer
fzwoch commented 1 year ago

Cross compile will disable CGO by default. You will have to explicitly enable it and point your C compiler to the mingw cross compiler.

Stevie-O commented 1 year ago

I'm trying real hard to be helpful here, because I very much want to get this issue resolved, but you're not giving me much to go on. I have absolutely no idea where to start to try getting this working.

You said you have scripts to do this, and that they are complex (and non-portable), but that doesn't really give me a starting point. The only clue I have is that you have a Bourne-shell-like environment to kick things off, and access to pkg-config.

fzwoch commented 1 year ago

The above code snippet is from Linux and should probably almost immediately compile (with the exception of the missing frontend library from obs) when the correct packages are installed. Linux is just the most convenient for development. You just need a Linux client to run it.

I don't have Windows, so I cross compile with Mingw. I get libobs and their frontend library from the the OBS installer and the headers from their source code. libjpeg I cross compile myself. You can probably pick some precompiled .dll with headers from somewhere too (I just wanted to statically link it, so I went the extra effort of building). MSVCRT or UCRT I believe does not even matter, I think both should work, at least I don't even know what is default - I never selected anything.

Then it is probably something like this

export CC=x86_64-w64-mingw32-gcc
export CGO_ENABLED=1
export CGO_CFLAGS="-I/path/to/libobs/headers -I/path/to/libjpeg/headers"
export CGO_LDFLAGS="-ljpeg -L/path/to/jpeg/lib -lobs -lobs-frontend-api -L/path/to/libobs/lib"
export GOOS=windows

And then your go build line. May need to copy some headers from the frontend library around. Or the lib. I vaguely remember something was up with it, but I cannot remember if it was macOS or Windows.

Stevie-O commented 1 year ago

I'm assuming that the libjpeg you're using is actually from libjpeg-turbo. Is that right?

I get libobs and their frontend library from the the OBS installer and the headers from their source code

Where do you pull these from the OBS installer? Which files are these?

I'm assuming the headers in question are the ones listed on this page: https://obsproject.com/docs/plugins.html (I'd just copy over all the .h files in the libobs directory)

fzwoch commented 1 year ago

The regular installer from OBS does have the dll files. It is a little effort under Linux to extract/install it, but on windows you can just regular install it and pick them from the install directory.

I usually pick the install from the OBS github page. On the release page there is also the corresponding source zip that matches the OBS version. These include all the headers.

Yeah I use libjpeg turbo. It should be API compatible with the regular jpeg, but turbo is faster and I believe even has become the the standard reference implementation.

Stevie-O commented 1 year ago

Okay, I've made significant progress! You weren't wrong -- that was an adventure.

For those who come after me, here's what I did:

  1. copied obs.dll and obs-frontend-api.dll to ../obs-teleport-deps/lib
  2. cross-compiled libjpeg-turbo according to the instructions in that project's BUILDING.md with the command line: cmake -G"Unix Makefiles" -DCMAKE_TOOLCHAIN_FILE=toolchain.cmake -DCMAKE_INSTALL_PREFIX=~/libjpeg-turbo-git/build -DCMAKE_SHARED_LINKER_FLAGS="-static-libgcc -static-libstdc++"
  3. copied libjpeg headers to ../obs-teleport-deps/include
  4. copied libjpeg .a files to ../obs-teleport-deps/lib
  5. Ran the following commands:
export CC=x86_64-w64-mingw32-gcc;
export CGO_ENABLED=1;
export CGO_CFLAGS="-I$(pwd)/../obs-teleport-deps/include -I$(pwd)/../obs-studio/libobs -I$(pwd)/../obs-studio/UI/obs-frontend-api";
#export CGO_LDFLAGS="-L$(pwd)/../obs-teleport-deps/lib -lobs -lobs-frontend-api -ljpeg";
export CGO_LDFLAGS="-L$(pwd)/../obs-teleport-deps/lib -lobs -lobs-frontend-api -l:libjpeg.a -static-libgcc";
export GOOS=windows;
go build -x -buildmode=c-shared -o obs-teleport.dll

I should note that something is still amiss, because the resulting DLL is nearly 2.5x the size of your official release (official build is 3.5MB, my DLL is 8.5MB! But it works and it's teleporting frames from my laptop to the desktop. I'm gonna start adding some trace statements so I can see what's going on...

fzwoch commented 1 year ago

Congrats. You can also hit me up on Discord for some discussion.

Do a x86_64-w64-mingw32-strip -s obs-teleport.dll to remove debug symbols from the DLL and the sizes should be closer.

Maybe jpeg-turbo needs -DCMAKE_BUILD_TYPE=Release, not sure if it defaults to that.

Stevie-O commented 1 year ago

(Compiling teleport is consistently taking 2m41s on my machine. Is that normal? This is making iteration painfully slow.)

Okay, the results of a few tests suggest that the changes you made in 5675d12e3f3bd9f9d86d2795ceb322a365da959d won't make a difference.

I modified output.go to print out some stats every 120 frames: https://github.com/Stevie-O/obs-teleport/tree/sender-side-stall-check

I set the framerate back to 60fps (so, every 2 seconds) and saw this on the transmit side:

16:21:10.270: [obs-teleport] teleport: enqueued frame 120 with timestamp 2166666580, 1 queued, 63 pending writes
16:21:12.402: [obs-teleport] teleport: enqueued frame 240 with timestamp 4166666500, 3 queued, 163 pending writes
16:21:14.236: [obs-teleport] teleport: enqueued frame 360 with timestamp 6166666420, 1 queued, 268 pending writes
16:21:16.236: [obs-teleport] teleport: enqueued frame 480 with timestamp 8166666340, 1 queued, 376 pending writes
(skip a few)
16:22:06.234: [obs-teleport] teleport: enqueued frame 3480 with timestamp 58166664340, 1 queued, 3106 pending writes
16:22:08.237: [obs-teleport] teleport: enqueued frame 3600 with timestamp 60166664260, 1 queued, 3210 pending writes
16:22:10.235: [obs-teleport] teleport: enqueued frame 3720 with timestamp 62166664180, 1 queued, 3312 pending writes

This reveals the following:

  1. output_raw_frame isn't getting blocked, it's fairly reliably being called 60 times per second. (3600 frames in 59.965 seconds is less than 0.06% off from 60fps)
  2. The queue isn't building up, and is in fact nearly useless (I can't think of a scenario where it would actually be useful, actually.)
  3. Rather, what's getting built up are the network Write calls at https://github.com/Stevie-O/obs-teleport/blob/5675d12e3f3bd9f9d86d2795ceb322a365da959d/sender.go#L78
  4. Which means that the (rather complex) changes made by 5675d12e3f3bd9f9d86d2795ceb322a365da959d don't actually solve any problems that can occur in in reality; IMO you probably ought to revert that commit because it adds a ton of complexity for zero payoff. (Though keeping it in a branch might be useful, to serve as a basis for a proper fix.)

Therefore, the only conclusion I can draw is this: The sender side is trying to send frames faster than the receiver can receive them.

There are two possible causes for this: either the network can't keep up, or something is going wrong on the receiver end, and the receiver is not reading from the socket quickly enough. While that's a thing I need to investigate further (as soon as I figure out how), the problem remains:

The transmission code needs to be modified to drop frames if OBS is generating them faster than the receiver(s) can receive them.

(Regarding fixing it properly, I'll take up your offer for talking this out over Discord.)

YorVeX commented 1 year ago

Happy to see that you're going to take this over into a more direct engagement on Discord, will certainly speed things up. But since I've also been following this "thread" here with a lot of interest and curiosity I hope that you also note your findings here at some point, would be nice 😄

And of course let me know if you need to run any new builds through some tests with various OBS versions or over a GBit Ethernet.

fzwoch commented 1 year ago

Latest version has a bit more control over the queues and won't let them grow indefinitely. Also some more logging should report when the queues gets unusually high. If there is a general bandwidth/CPU limitation though there is little that can be done.