bluenviron / mediamtx

Ready-to-use SRT / WebRTC / RTSP / RTMP / LL-HLS media server and media proxy that allows to read, publish, proxy, record and playback video and audio streams.
MIT License
12.19k stars 1.53k forks source link

proxy mode over the internet - connection does not resume after network congestion #335

Closed xdanik closed 3 years ago

xdanik commented 3 years ago

Which version are you using?

Tested with: v0.9.15, v0.15.2 and v0.15.3.

Which operating system are you using?

OS

Architecture

Describe the issue

I am running two instances of rtsp-simple-server - one on server A and one on server B. Servers A and B are interconnected using OpenVPN tunel over the internet (tap, tcp - I know tcp is not ideal, but Mikrotik does not support udp mode yet). Both servers are running Debian 10 on x64 platform (both are virtual machines). Server A receives rtsp stream from ffmpeg (ffmpeg publishes into rtsp-simple-server). Server B connects to server A and retrieves the stream (sourceOnDemand: no). All streams are running over TCP. Clients are connecting to the stream on server A and also on server B. After a while clients connected to server B are unable to retrieve the stream, while clients connected directly to server A works without issues.

ffmpeg just hangs during connecting to the server B: ``` ffprobe rtsp://172.31.150.254:8554/cam_grid -loglevel debug ffprobe version 4.2.2 Copyright (c) 2007-2019 the FFmpeg developers built with gcc 9.2.1 (GCC) 20200122 [tcp @ 000001ead035d740] No default whitelist set [tcp @ 000001ead035d740] Original list of addresses: [tcp @ 000001ead035d740] Address 172.31.150.254 port 8554 [tcp @ 000001ead035d740] Interleaved list of addresses: [tcp @ 000001ead035d740] Address 172.31.150.254 port 8554 [tcp @ 000001ead035d740] Starting connection attempt to 172.31.150.254 port 8554 ```
while on server A it works with no issues: ``` ffprobe rtsp://user:password@192.168.25.5:8554/cam_grid_low -loglevel debug ffprobe version 4.2.2 Copyright (c) 2007-2019 the FFmpeg developers built with gcc 9.2.1 (GCC) 20200122 [tcp @ 0000017254cfd7c0] No default whitelist set [tcp @ 0000017254cfd7c0] Original list of addresses: [tcp @ 0000017254cfd7c0] Address 192.168.25.5 port 8554 [tcp @ 0000017254cfd7c0] Interleaved list of addresses: [tcp @ 0000017254cfd7c0] Address 192.168.25.5 port 8554 [tcp @ 0000017254cfd7c0] Starting connection attempt to 192.168.25.5 port 8554 [tcp @ 0000017254cfd7c0] Successfully connected to 192.168.25.5 port 8554 [rtsp @ 0000017254cfce40] SDP: v=0 o=- 0 0 IN IP4 127.0.0.1 s=Stream c=IN IP4 0.0.0.0 t=0 0 m=video 0 RTP/AVP 96 b=AS:2000 a=rtpmap:96 H264/90000 a=fmtp:96 packetization-mode=1; sprop-parameter-sets=Z2QAKKwrQDwBE/LgLZAAAD6AAAdTDgAAB6EgAAehIbvLgoA=,aO48sA==; profile-level-id=640028 a=control:trackID=0 [rtsp @ 0000017254cfce40] video codec set to: h264 [rtsp @ 0000017254cfce40] RTP Packetization Mode: 1 [rtsp @ 0000017254cfce40] Extradata set to 0000017254d011c0 (size: 47) [rtsp @ 0000017254cfce40] RTP Profile IDC: 64 Profile IOP: 0 Level: 28 [rtp @ 0000017254cffc00] No default whitelist set [udp @ 0000017254d02480] No default whitelist set [udp @ 0000017254d02480] 'circular_buffer_size' option was set but it is not supported on this build (pthread support is required) [udp @ 0000017254d02480] end receive buffer size reported is 65536 [udp @ 0000017254d12740] No default whitelist set [udp @ 0000017254d12740] 'circular_buffer_size' option was set but it is not supported on this build (pthread support is required) [udp @ 0000017254d12740] end receive buffer size reported is 65536 [rtsp @ 0000017254cfce40] setting jitter buffer size to 500 [rtsp @ 0000017254cfce40] hello state=0 [h264 @ 0000017254d00c80] nal_unit_type: 7(SPS), nal_ref_idc: 3 [h264 @ 0000017254d00c80] nal_unit_type: 8(PPS), nal_ref_idc: 3 [h264 @ 0000017254d00c80] nal_unit_type: 7(SPS), nal_ref_idc: 3 [h264 @ 0000017254d00c80] nal_unit_type: 8(PPS), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 [h264 @ 0000017254d00c80] Format yuvj420p chosen by get_format(). [h264 @ 0000017254d00c80] Reinit context to 1920x1088, pix_fmt: yuvj420p [h264 @ 0000017254d00c80] Frame num gap 3 1 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 [rtsp @ 0000017254cfce40] max delay reached. need to consume packet [rtsp @ 0000017254cfce40] RTP: missed 117 packets [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 5(IDR), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 [h264 @ 0000017254d00c80] bytestream overread -6 [h264 @ 0000017254d00c80] error while decoding MB 70 18, bytestream -6 [h264 @ 0000017254d00c80] concealing 5979 DC, 5979 AC, 5979 MV errors in I frame [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 1 times [h264 @ 0000017254d00c80] nal_unit_type: 6(SEI), nal_ref_idc: 0 [h264 @ 0000017254d00c80] nal_unit_type: 1(Coded slice of a non-IDR picture), nal_ref_idc: 3 [h264 @ 0000017254d00c80] ct_type:0 pic_struct:0 Last message repeated 21 times [rtsp @ 0000017254cfce40] All info found Input #0, rtsp, from 'rtsp://user:password@192.168.25.5:8554/cam_grid_low': Metadata: title : Stream Duration: N/A, start: 1.637189, bitrate: N/A Stream #0:0, 40, 1/90000: Video: h264 (High), 1 reference frame, yuvj420p(pc, progressive, left), 1920x1080 (1920x1088) [SAR 1:1 DAR 16:9], 0/1, 15 fps, 15 tbr, 90k tbn, 30 tbc [h264 @ 00000172556d1b00] nal_unit_type: 7(SPS), nal_ref_idc: 3 [h264 @ 00000172556d1b00] nal_unit_type: 8(PPS), nal_ref_idc: 3 ```

Restarting the rtsp-simple-server on server B temporarily resolves the issue.

This is probably related to some kind of network interruption as the connection goes over internet - probably caused by network congestion (the bandwidth from server A is limited - its our residential connection with 10Mbps upload). Running a speedteest from the network where server A is running seems to trigger the issue quite reliably.

I would completely understand quick stream dropout when the network connection is overwhelmed, but the stream should re-connect itself automatically without the need to restart the whole server.

Unfortunately there are no signs of connection issues in logs from any of the rtsp-simple-server instances. Also no errors in the OpenVPN daemon. Simply nothing.

Server A is running completely default config - just added username and password.

Server B is running nearly default config - just added the sources definition and configured lower timeouts: ``` ############################################### # General options # sets the verbosity of the program; available values are "warn", "info", "debug". logLevel: info # destinations of log messages; available values are "stdout", "file" and "syslog". logDestinations: [stdout] # if "file" is in logDestinations, this is the file which will receive the logs. logFile: rtsp-simple-server.log # listen IP. If provided, all listeners will listen on this specific IP. listenIP: # timeout of read operations. readTimeout: 2s # timeout of write operations. writeTimeout: 2s # number of read buffers. # a higher number allows a higher throughput, # a lower number allows to save RAM. readBufferCount: 512 # enable Prometheus-compatible metrics. metrics: yes # port of the metrics listener. metricsPort: 9998 # enable pprof-compatible endpoint to monitor performances. pprof: no # port of the pprof listener. pprofPort: 9999 # command to run when a client connects to the server. # this is terminated with SIGINT when a client disconnects from the server. # the server port is available in the RTSP_PORT variable. runOnConnect: # the restart parameter allows to restart the command if it exits suddenly. runOnConnectRestart: no ############################################### # RTSP options # disable support for the RTSP protocol. rtspDisable: no # supported RTSP stream protocols. # UDP is the most performant, but can cause problems if there's a NAT between # server and clients, and doesn't support encryption. # TCP is the most versatile, and does support encryption. # The handshake is always performed with TCP. protocols: [udp, tcp] # encrypt handshake and TCP streams with TLS (RTSPS). # available values are "no", "strict", "optional". encryption: no # port of the TCP/RTSP listener. This is used only if encryption is "no" or "optional". rtspPort: 8554 # port of the TCP/TLS/RTSPS listener. This is used only if encryption is "strict" or "optional". rtspsPort: 8555 # port of the UDP/RTP listener. This is used only if "udp" is in protocols. rtpPort: 8000 # port of the UDP/RTCP listener. This is used only if "udp" is in protocols. rtcpPort: 8001 # path to the server key. This is used only if encryption is "strict" or "optional". serverKey: server.key # path to the server certificate. This is used only if encryption is "strict" or "optional". serverCert: server.crt # authentication methods. authMethods: [basic, digest] # read buffer size. # this doesn't influence throughput and shouldn't be touched unless the server # reports errors about the buffer size. readBufferSize: 2048 ############################################### # RTMP options # disable support for the RTMP protocol. rtmpDisable: no # port of the RTMP listener. rtmpPort: 1935 ############################################### # Path options # these settings are path-dependent. # it's possible to use regular expressions by using a tilde as prefix. # for example, "~^(test1|test2)$" will match both "test1" and "test2". # for example, "~^prefix" will match all paths that start with "prefix". # the settings under the path "all" are applied to all paths that do not match # another entry. paths: all: # source of the stream - this can be: # * record -> the stream is published by a RTSP or RTMP client # * rtsp://existing-url -> the stream is pulled from another RTSP server # * rtsps://existing-url -> the stream is pulled from another RTSP server # * rtmp://existing-url -> the stream is pulled from a RTMP server # * redirect -> the stream is provided by another path or server source: record # if the source is an RTSP URL, this is the protocol that will be used to # pull the stream. available options are "automatic", "udp", "tcp". # the tcp protocol can help to overcome the error "no UDP packets received recently". sourceProtocol: automatic # if the source is an RTSP or RTMP URL, it will be pulled only when at least # one reader is connected, saving bandwidth. sourceOnDemand: no # if sourceOnDemand is "yes", readers will be put on hold until the source is # ready or until this amount of time has passed. sourceOnDemandStartTimeout: 10s # if sourceOnDemand is "yes", the source will be closed when there are no # readers connected and this amount of time has passed. sourceOnDemandCloseAfter: 10s # if the source is "redirect", this is the RTSP URL which clients will be # redirected to. sourceRedirect: # if the source is "record" and a client is publishing, do not allow another # client to disconnect the former and publish in its place. disablePublisherOverride: no # if the source is "record" and no one is publishing, redirect readers to this # path. It can be can be a relative path (i.e. /otherstream) or an absolute RTSP URL. fallback: # username required to publish. # sha256-hashed values can be inserted with the "sha256:" prefix. publishUser: # password required to publish. # sha256-hashed values can be inserted with the "sha256:" prefix. publishPass: # ips or networks (x.x.x.x/24) allowed to publish. publishIps: [] # username required to read. # sha256-hashed values can be inserted with the "sha256:" prefix. readUser: # password required to read. # sha256-hashed values can be inserted with the "sha256:" prefix. readPass: # ips or networks (x.x.x.x/24) allowed to read. readIps: [] # command to run when this path is initialized. # this can be used to publish a stream and keep it always opened. # this is terminated with SIGINT when the program closes. # the path name is available in the RTSP_PATH variable. # the server port is available in the RTSP_PORT variable. runOnInit: # the restart parameter allows to restart the command if it exits suddenly. runOnInitRestart: no # command to run when this path is requested. # this can be used to publish a stream on demand. # this is terminated with SIGINT when the path is not requested anymore. # the path name is available in the RTSP_PATH variable. # the server port is available in the RTSP_PORT variable. runOnDemand: # the restart parameter allows to restart the command if it exits suddenly. runOnDemandRestart: no # readers will be put on hold until the runOnDemand command starts publishing # or until this amount of time has passed. runOnDemandStartTimeout: 10s # the runOnDemand command will be closed when there are no # readers connected and this amount of time has passed. runOnDemandCloseAfter: 10s # command to run when a client starts publishing. # this is terminated with SIGINT when a client stops publishing. # the path name is available in the RTSP_PATH variable. # the server port is available in the RTSP_PORT variable. runOnPublish: # the restart parameter allows to restart the command if it exits suddenly. runOnPublishRestart: no # command to run when a clients starts reading. # this is terminated with SIGINT when a client stops reading. # the path name is available in the RTSP_PATH variable. # the server port is available in the RTSP_PORT variable. runOnRead: # the restart parameter allows to restart the command if it exits suddenly. runOnReadRestart: no cam1_1: sourceProtocol: tcp source: rtsp://user:password@192.168.25.5:8554/cam1_1 sourceOnDemand: yes # skipping few identical streams ... # this is the problematic one cam_grid: sourceProtocol: tcp source: rtsp://user:password@192.168.25.5:8554/cam_grid_low sourceOnDemand: no ```

Describe how to replicate the issue

  1. start two instances on rtsp-simple-server on two separate machines
  2. publish with ffmpeg to first server
  3. configure the second server to always read the stream from the first server (sourceOnDemand: no)
  4. open an player and connect to the stream on the second server
  5. try to make the network between the servers unstable - run speedtest/iperf a few times...
  6. the video should freeze

I recorded a network dump using tcpdump (tcpdump -i any port 8554 -w tcpdump.cap) - its recorded on the server B and it containas:

  1. rtsp-simple-server started
  2. confirmed that stream si working
  3. started a speedtest on network with server A
  4. confirmed that stream is down

Endpoints are: 172.16.26.254,172.31.150.254 = server B 192.168.25.5 = server A 172.31.150.1 = ffmpeg client trying to watch stream from server B

tcpdump.zip

I know that this is kind of vague description - its all I currently have.

aler9 commented 3 years ago

Hello, thank you very much for reporting this issue, it has been fixed in main and will be available in next release.

A timeout on TCP streams was not applied correctly, and TCP streams were kept open indefinitely. The unit test in charge of checking timeouts was, for a coincidence, reporting a false negative... evidently humans can still beat machines.

Please try the attached nightly release and let me know if you have any more problems. rtsp-simple-server_v0.15.3-11-gf208026_windows_amd64.zip rtsp-simple-server_v0.15.3-11-gf208026_linux_arm64v8.tar.gz rtsp-simple-server_v0.15.3-11-gf208026_linux_arm7.tar.gz rtsp-simple-server_v0.15.3-11-gf208026_linux_arm6.tar.gz rtsp-simple-server_v0.15.3-11-gf208026_linux_amd64.tar.gz rtsp-simple-server_v0.15.3-11-gf208026_darwin_amd64.tar.gz

xdanik commented 3 years ago

Firstly, many thanks for such a quick response!

I have tested the proposed nightly release and I would like to report that the issue seems to been resolved. 🎉

I ran speedtest multiple times in a row and also tried limiting upload bandwidth on server A. The stream was always able to recover itself. 👍

image

aler9 commented 3 years ago

added in v0.15.4

github-actions[bot] commented 1 year ago

This issue is being locked automatically because it has been closed for more than 6 months. Please open a new issue in case you encounter a similar problem.