bluenviron / mediamtx

Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
MIT License
10.88k stars 1.41k forks source link

WHEP source ends up in deadlock #3108

Closed RouquinBlanc closed 4 months ago

RouquinBlanc commented 4 months ago

Which version are you using?

v1.5.1

Which operating system are you using?

Describe the issue

This is probably very similar to #3062 but kept aside until getting more evidence about being a duplicate.

Basically the instance remains stuck, unable to reconnect a WHEP source, which happens more on bad connections.

Describe how to replicate the issue

  1. Start a mediamtx instance with a WHEP source configured. For the remote part, we use another mediamtx with an RTSP stream.
  2. If the connection to the WHEP source goes down or is unreachable, mediamtx will not reconnect anymore until restart

Logs were taken including goroutines listing when the issue happens.

It looks like the issue may be similar to this issue, when blocking on a webrtc callback deadlocks.

Looking at goroutines, we have this blocked:

1 @ 0x1043ef218 0x1043b98a4 0x1043b9464 0x104ae1840 0x104b47464 0x104b4a7ec 0x104b5fc58 0x104b853d4 0x104b853c1 0x104bd7224 0x104be78f8 0x104429084
#   0x104ae183f github.com/pion/ice/v2.(*Agent).Close+0xdf                      /Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/ice/v2@v2.0.0-20231112223552-32d34dfcf3a1/agent.go:955
#   0x104b47463 github.com/pion/webrtc/v3.(*ICEGatherer).Close+0x63                 /Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/icegatherer.go:197
#   0x104b4a7eb github.com/pion/webrtc/v3.(*ICETransport).Stop+0xab                 /Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/icetransport.go:202
#   0x104b5fc57 github.com/pion/webrtc/v3.(*PeerConnection).Close+0x3f7                 /Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/peerconnection.go:2088
#   0x104b853d3 github.com/bluenviron/mediamtx/internal/protocols/webrtc.(*PeerConnection).Close+0x433  /Users/xxxxxxxx/workspace/go/src/mediamtx/internal/protocols/webrtc/peer_connection.go:142
#   0x104b853c0 github.com/bluenviron/mediamtx/internal/protocols/webrtc.(*WHIPClient).Read+0x420   /Users/xxxxxxxx/workspace/go/src/mediamtx/internal/protocols/webrtc/whip_client.go:146
#   0x104bd7223 github.com/bluenviron/mediamtx/internal/staticsources/webrtc.(*Source).Run+0x1f3    /Users/xxxxxxxx/workspace/go/src/mediamtx/internal/staticsources/webrtc/source.go:54
#   0x104be78f7 github.com/bluenviron/mediamtx/internal/core.(*staticSourceHandler).run.func1.1+0x47    /Users/xxxxxxxx/workspace/go/src/mediamtx/internal/core/static_source_handler.go:172

And this at the same time:

1 @ 0x1043ef218 0x1044026b8 0x104b832a0 0x104b473a4 0x104ae4754 0x104add81c 0x104429084
#   0x104b8329f github.com/bluenviron/mediamtx/internal/protocols/webrtc.(*PeerConnection).Start.func3+0x17f    /Users/xxxxxxxx/workspace/go/src/mediamtx/internal/protocols/webrtc/peer_connection.go:127
#   0x104b473a3 github.com/pion/webrtc/v3.(*ICEGatherer).Gather.func1+0x273                 /Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/webrtc/v3@v3.0.0-20231112223655-e402ed2689c6/icegatherer.go:177
#   0x104ae4753 github.com/pion/ice/v2.(*Agent).onCandidate+0x83                        /Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/ice/v2@v2.0.0-20231112223552-32d34dfcf3a1/agent_handlers.go:34
#   0x104add81b github.com/pion/ice/v2.(*Agent).candidateRoutine+0x4b                       /Users/xxxxxxxx/workspace/go/pkg/mod/github.com/aler9/ice/v2@v2.0.0-20231112223552-32d34dfcf3a1/agent_handlers.go:58

The issue can be made systematic by forcing something to fail in WHIPClient.Read after c.pc.Start(). The most likely to fail is PostOffer but any call to c.pc.Close() before the for loop is going to deadlock.

Did you attach the server logs?

failure.log goroutines.txt

Logs and pprof goroutines list during issue

Did you attach a network dump?

no (does not look relevant)

github-actions[bot] commented 3 months ago

This issue is mentioned in release v1.7.0 🚀 Check out the entire changelog by clicking here