Closed koenkarsten closed 1 year ago
NGINX config used in Dockerfile:
load_module '/usr/lib/nginx/modules/ngx_stream_module.so';
worker_processes auto;
events {
}
http {
server {
listen 10000;
root /var/log/srt;
location / {
}
}
}
stream {
server {
listen 8000 udp;
proxy_pass localhost:9000;
}
}
Besides this tried many proxy buffer / rate / keepalive settings etc, none of which help. Also tuned workers & connections, same results, whether running on big VM or local.
All I can see is that you experience 50% loss on the UDP link, or possibly high reordering. Might be helpful to understand what is happening if you record a pcap file on the receiver side. I actually doubt that it is reordering because if this was the case, the number of belated packets would be much higher, close to the values of half of retransmission.
Note one thing: it is not SRT responsible that these numbers are so high, but your UDP link. Maybe SRT can behave better when the parameters are slightly better adjusted, but I can't see any unexpected behavior here. I'm even surprised that the number of dropped packets is so low.
Yes the UDP link (NGINX) certainly is the issue here, so this inquiry is more a call for information from anyone who setup source -> NGINX -> srt-live-transmit
before. In a way it shows the great performance of SRT by the relatively low number of dropped packets indeed 👍 . So if anybody can supply information on this topic, or provide alternative UDP proxy setups I'm all ears!
Well, you misled us a bit by reporting a "BUG". Things would go different, if you chose a "QUESTION".
For starters, I can advise you setting a higher latency. Very low latency values are good for only a bit lossy links. With high losses it is usually unlikely to have a good recovery rate, but here SRT behaves surprisingly well, so might be that the link has high bandwidth capacity fluctuations (that is, often the capacity is above the bitrate, otherwise you wouldn't see so many retransmissions). But to recover packets, and especially if it would need secod or third time retransmissions, you need to give SRT more time to deliver the packets. Note also that with extremely high latencies you might need to set a higher value of the receiver buffer size.
Thanks, the mentioned tweaks have been performed and helped a bit, but not the full way. After a lot of debugging we pin pointed the issue, which lays on the NGINX side as expected: Setting worker_processes: auto
ensures 1 worker process per CPU, but having multiple causes a lot of out these of order messages for SRT to process. Setting it to worker_processes: 1
"fixes" the issue, by enforcing all messages to be processed by the same worker, of course not utilising the full capacity of the server as a tradeoff. So at least I know the root cause now and can see how to fix this properly from here. Thanks for the suggestions!
Thanks for the workaround: @koenkarsten (I wish the solution was another, in case the Nginx team works on it)
We also found a big packet loss problem when sending SRT video streams over UDP protocol. The problem lies in the UDP protocol itself, which is able to recognize retransmitted packets only when they arrive from the same source port that was already sending the previous one.
Passing through Nginx, which by default splits the signal across multiple workers, a large number of these packets arrive at their destination with a different source port and the video app treats them as independent flows, effectively losing the packet and creating video glitches. When worker 1 is set, transmission occurs without changing the source port and packets arrive with almost 0% loss
Describe the bug When running srt-live-transmit behind a NGINX proxy a high number of retransmits occur, which ultimatly leads to belated dropped packets & distortion in the payload.
To Reproduce Please find the Dockerfile.txt to reproduce attached (remove .txt extension):
docker build -t srt-nginx .
docker run -d -p 8000:8000/udp -p 19000:19000/tcp -p 9000:9000/udp -p 10000:10000/tcp srt-nginx
srt://localhost:8000?passphrase=supersecretpassphrase
leads to very high pktRcvDrop/pktRcvRetrans/pktRcvBelated (~30% retrans).srt://localhost:9000?passphrase=supersecretpassphrase
leads to very low pktRcvDrop/pktRcvRetrans/pktRcvBelated (0% retrans).Running this Dockerfile gives the following running locally:
224.0.0.0:29999
.Expected behavior When uploading SRT for a while the statistics can be accessed to show the issue:
For a healthy stream, these numbers should be near zero.
Screenshots
Desktop (please provide the following information):
Additional context I've tried many different NGINX setups, all failing to fix the issue. Would love to learn more where this comes from as it seems a conceptual / config issue as this already occurs with streams as low as 700kbit/s all the way through 7mbit/s. For this entire range the retrans rate is always around 30%, even when uploading over the internet to a VM/EC2 in AWS or testing locally on laptop with Docker.
Dockerfile.txt (remove .txt extension)