CESNET / UltraGrid

UltraGrid low-latency audio and video network transmission system
http://www.ultragrid.cz
Other
489 stars 55 forks source link

Nat-helper HolePunch Arguments #334

Closed sogorman closed 10 months ago

sogorman commented 10 months ago

Apologies if this has been covered but I am looking for clarification on the nat-helper / hole punch workflow specifically around the destination hostname / IP for the sender command arguments.

I have the Nat helper running along with a test sender and test receiver running on two different networks behind two different NATs. When starting the sender and reciever the logs show that they see each other in the same "test" room but no video is passed.

Do I need to specify the public IP of the receiver on the sender or just place them both in the same room?

Here is my sender command line [BENCHWORKSTATION hostname] arguments and logs:

Command args: -d multiplier:vulkan_sdl2#preview:key=u48q6gfm -r dummy --param errors-fatal -P30104:11064:5016:5014 -Nholepunch:room=test:server=ec2-3-137-160-143.us-east-2.compute.amazonaws.com

UltraGrid 1.8+ (master rev e3026d3 built Sep 11 2023 13:57:15)

[HOLEPUNCH] Connection: Waiting for remote client...
[HOLEPUNCH] Remote client name: BENCHWORKSTATION
[HOLEPUNCH] Received candidate: a=candidate:1 1 UDP 2122317823 10.0.0.170 57957 typ host
[HOLEPUNCH] Local candidate port: 57957
[HOLEPUNCH] Received candidate: a=candidate:2 1 UDP 2122317567 192.168.176.1 57957 typ host
[HOLEPUNCH] Local candidate port: 57957

And here are my receiver [DMWS24 hostname] command line arguments and log:

Command args: -t testcard -c libavcodec:codec=H.265:bitrate=2M -s testcard --audio-codec MP3:bitrate=256k -r dummy -f rs:200:220 --param errors-fatal 66.210.240.190 -P30104:11064:5016:5014 -Nholepunch:room=test:server=ec2-3-137-160-143.us-east-2.compute.amazonaws.com

UltraGrid 1.8+ (master rev e3026d3 built Sep 11 2023 13:57:15)

[HOLEPUNCH] Connection: Waiting for remote client...
[HOLEPUNCH] Remote client name: DMWS24
[HOLEPUNCH] Received candidate: a=candidate:1 1 UDP 2122317823 192.168.99.10 52646 typ host
[HOLEPUNCH] Local candidate port: 52646

Here is the console output of nat-helper:

root@ip-172-31-1-209:/home/ubuntu/UltraGrid-master/nat-helper/build# ./nat-helper
Running
Moving client BENCHWORKSTATION to room test_video
Creating room test_video
Moving client DMWS24 to room test_video
Client candidate recieved
Client candidate recieved
Client candidate recieved
TheSashmo commented 10 months ago

@sogorman give me a call, you already have my cell ;)

mpiatka commented 10 months ago

From the logs it seems like only local candidate addresses get sent. The clients need to connect to a STUN server to learn their public IPs. Do you have a STUN server running alongside the nat-helper on the ec2 instance (the holepunching module expects it to be on port 3478 if not specified explicitly)?

sogorman commented 10 months ago

@TheSashmo I don't think I follow.

@mpiatka thanks, I made the error in assuming that the nat-helper also ran it's own STUN service. I have a STUN server running on the same public Nat helper and now the logs show the public IPs, but the connection is still not made. All the public and private IPs look correct but after both the sender and receiver join the room the nat helper throws "Error reading candidate" and then removes them.

Sender Arguments

-t testcard -c libavcodec:codec=H.265:bitrate=2M -s testcard --audio-codec MP3:bitrate=256k -r dummy -f rs:200:220 --param errors-fatal 66.210.240.190 -Nholepunch:room=test:server=18.118.xxx.xxx
Receiver Arguments:

-d multiplier:vulkan_sdl2#preview:key=u48q6gfm -r dummy --param errors-fatal  -Nholepunch:room=test:server=18.118.xxx.xxx

./nat-helper
Running
Moving client BENCHWORKSTATION to room test_video
Creating room test_video
Moving client DMWS24 to room test_video
Client candidate recieved
Client candidate recieved
Client candidate recieved
Client candidate recieved
Moving client DMWS24 to room test_audio
Creating room test_audio
Moving client BENCHWORKSTATION to room test_audio
Error reading candidate, removing client DMWS24
Error reading candidate, removing client BENCHWORKSTATION
Error reading candidate, removing client DMWS24
Error reading candidate, removing client BENCHWORKSTATION

UltraGrid 1.8+ (master rev e3026d3 built Sep 11 2023 13:57:15)

[HOLEPUNCH] Connection: Waiting for remote client...
[HOLEPUNCH] Remote client name: DMWS24
[HOLEPUNCH] Received candidate: a=candidate:1 1 UDP 2122317823 192.168.99.10 61642 typ host
[HOLEPUNCH] Local candidate port: 61642
[HOLEPUNCH] Received candidate: a=candidate:2 1 UDP 1686109951 70.xxx.235.141 61642 typ srflx raddr 0.0.0.0 rport 0
[HOLEPUNCH] Local candidate  : 70.xxx.235.141:61642
[HOLEPUNCH] Remote candidate : 66.xxx.240.190:50508
[HOLEPUNCH] Connection: Waiting for remote client...
[HOLEPUNCH] Remote client name: DMWS24

UltraGrid 1.8+ (master rev e3026d3 built Sep 11 2023 13:57:15)

[HOLEPUNCH] Connection: Waiting for remote client...
[HOLEPUNCH] Remote client name: BENCHWORKSTATION
[HOLEPUNCH] Received candidate: a=candidate:1 1 UDP 2122317823 10.0.0.170 50508 typ host
[HOLEPUNCH] Local candidate port: 50508
[HOLEPUNCH] Received candidate: a=candidate:2 1 UDP 1686109951 66.xxx.240.190 50508 typ srflx raddr 0.0.0.0 rport 0
[HOLEPUNCH] Local candidate  : 66.xxx.240.190:50508
[HOLEPUNCH] Remote candidate : 70.xxx.235.141:61642
[HOLEPUNCH] Connection: Waiting for remote client...
[HOLEPUNCH] Remote client name: BENCHWORKSTATION
mpiatka commented 10 months ago

Looks like it makes the connection successfully for video and then gets stuck on the audio connection for some reason. The first pair of error messages are just the clients disconnecting after the video connection is made. Today I made some changes so that the helper can differentiate between a disconnect and an actual error.

I also added some more logging to help us debug the issue. Could you build the helper from master and run it again with the latest UltraGrid nightly build? Also please add --verbose=7 to the sender and receiver parameters, so that we can see more clearly what it's hanging on.

sogorman commented 10 months ago

Thanks @mpiatka Attached are the full verbose logs for both the sender and receiver using the latest nightly builds. sender_log.txt receiver_log.txt

mpiatka commented 10 months ago

Alright, I believe I've found the issue and committed a fix. The latest nightlies should work now.

sogorman commented 10 months ago

@mpiatka Looking better now, the hole punch binding issue is resolved in the new nightly build and traffic now flows. Super awesome, appreciate you.