ajbouh / substrate

9 stars 3 forks source link

bridge2: session has many tracks without events #179

Closed ajbouh closed 6 months ago

ajbouh commented 7 months ago

Screenshot 2024-03-07 at 9 30 03 PM

I'm not sure why this is happening but it seems like something is not quite right. @mgood thoughts?

ajbouh commented 7 months ago

Screenshot 2024-03-08 at 11 53 36 AM

Here's a bit more information ... viewing this transcript means we are receiving and parsing over 600kB every update. This apparently includes scenarios where there are no events to show.

mgood commented 7 months ago

In the logs you showed with the WebRTC connection errors, it seems to be stuck in a loop reconnecting, and each time it's triggering the creation of a new track. So I think these are the same underlying issue: https://github.com/ajbouh/substrate/pull/167#issuecomment-1983431990

mgood commented 7 months ago

Adam mentioned this seems like it may be happening when multiple connections happening at the same time.

mgood commented 7 months ago

I tried making a page with a bunch of iframes to load the same session, but didn't see this error. I'll take a closer look through the loop to see if anything jumps out though.

ajbouh commented 6 months ago

I wonder if it's just my chrome that's in a weird state. I keep seeing this even though I don't have the iframe setup I previously had. The problem does not exist when I try to load bridge2 in Safari.

This is the patch I made for adding some better logging.

diff --git a/images/bridge2/webrtc/sfu/session.go b/images/bridge2/webrtc/sfu/session.go
index a3b6852db..0a4f92434 100644
--- a/images/bridge2/webrtc/sfu/session.go
+++ b/images/bridge2/webrtc/sfu/session.go
@@ -155,7 +155,8 @@ func (s *Session) Sync() {
        for i := range s.peers {
            if s.peers[i].ConnectionState() == webrtc.PeerConnectionStateClosed {
                s.peers = append(s.peers[:i], s.peers[i+1:]...)
-               return true // We modified the slice, start from the beginning
+               log.Println("sync: we removed a peer, try again")
+               return true // We modified the slice, try again
            }

            // map of sender we already are seanding, so we don't double send
@@ -171,6 +172,7 @@ func (s *Session) Sync() {
                // If we have a RTPSender that doesn't map to a existing track remove and signal
                if _, ok := s.tracks[sender.Track().ID()]; !ok {
                    if err := s.peers[i].RemoveTrack(sender); err != nil {
+                       log.Printf("sync: error removing a track, try again; %s", err)
                        return true
                    }
                }
@@ -190,6 +192,7 @@ func (s *Session) Sync() {
                if _, ok := existingSenders[trackID]; !ok {
                    log.Println("sync: adding track to peer:", i, trackID)
                    if _, err := s.peers[i].AddTrack(s.tracks[trackID]); err != nil {
+                       log.Printf("sync: error adding a track, try again; %s", err)
                        return true
                    }
                }
@@ -197,14 +200,17 @@ func (s *Session) Sync() {

            offer, err := s.peers[i].CreateOffer(nil)
            if err != nil {
+               log.Printf("sync: error creating an offer, try again; %s", err)
                return true
            }

            if err = s.peers[i].SetLocalDescription(offer); err != nil {
+               log.Printf("sync: error setting local description, try again; %s", err)
                return true
            }

            if err = s.peers[i].Signal("offer", offer); err != nil {
+               log.Printf("sync: error trying to signal offer, try again; %s", err)
                return true
            }
        }
@@ -212,7 +218,7 @@ func (s *Session) Sync() {
        return
    }

-   log.Println("sync start")
+   log.Println("sync: start")
    for syncAttempt := 0; ; syncAttempt++ {
        if syncAttempt == 25 {
            // Release the lock and attempt a sync in 1s. We might be blocking a RemoveTrack or AddTrack
@@ -220,11 +226,12 @@ func (s *Session) Sync() {
                time.Sleep(time.Second * 1)
                s.Sync()
            }()
+           log.Printf("sync: yielding early")
            return
        }
        if !attemptSync() {
            break
        }
    }
-   log.Println("sync finished")
+   log.Println("sync: finished")
 }
mgood commented 6 months ago

Ok, I did encounter this on Arc yesterday myself, so I'll try to reproduce again and see if I can gather more info