TokTok / c-toxcore

The future of online communications.
https://tox.chat
GNU General Public License v3.0
2.26k stars 285 forks source link

Why media traffic is being relayed? #2356

Open akamaus opened 1 year ago

akamaus commented 1 year ago

During experiments with voice calls (qTox-1.17.3, toxcore 0.2.13, both parties are behind the full cone nat) I regularly found myself in a situation where UDP traffic was relayed through one or several nodes in remote parts of Globe. Behavior is different each time I tried: sometimes it's a single peer for both parties, sometimes two peers, one for egress and another for ingress traffic, sometimes peers differ for parties, so probably a chain is longer than one. Is it a documented behavior? Calls supposed to be direct, aren't they?

emdee-is commented 1 year ago

I don't think this should be happening, especially if the type of routing changes.

Are your conversations of a commercial or sensitive nature such that someone would want to MITM them?

Can you explain a little about how you are determining that the circuits are being relayed?

Can you see the IPs of where they are going to? Tox is a fairly small network and I should be able to give you an idea of if the IPs are part of the Tox network. IPv4 or IPv6?

What countries are your ends of the connection in?

zoff99 commented 1 year ago

@akamaus yes this is a normal behavior. tox works much like webrtc in this mannor. if a direct connection can not be established, tcp relays will be used to relay traffic. the relays can not read any of the data they relay.

see: https://toktok.ltd/spec.html#introduction and https://toktok.ltd/spec.html#tcp-connections

nurupo commented 1 year ago

if a direct connection can not be established

@zoff99 By mentioning the full cone nat they imply that the direct connection should be possible.

@akamaus Tox does direct connection only when both parties use UDP. If one of the parties disables UDP, forcing toxcore to be TCP-only, it will use a TCP relay node as toxcore doesn't support direct connections via TCP. Also note that using a SOCKS/HTTP proxy forces the TCP-only mode. I'd like to say that seeing toxcore logs might help figure out why that happens, but I'm not too sure if they are as detailed as to mention the connection type per friend? Perhaps someone else could clarify that.

emdee-is commented 1 year ago

@nurupo but if the connection was established UDP in a full-cone NAT, would Tox ever change to using TCP through relays dynamically? Am I right in saying there should be no UDP connections outside the cone, or would Tox do UDP connections outside for the DHT?

@akamaus can you determine the IP addresses - I should have a rough idea if they are Tox relays or perhaps Tox even Tox DHT. Calls supposed to be direct, but calls aren't the only Tox traffic during a call - there's traffic for the DHT which is usually UDP.

@zoff99 @nurupo am I right in expecting that the majority of DHT traffic would be using the DHTnodes.json(ish) IPs?

akamaus commented 1 year ago

@zoff99 Actually, I'm not 100% sure it's full-cone.

I executed stunclient and got this:

% stunclient stun.stunprotocol.org --mode full        
Binding test: success
Local address: 192.168.0.103:50587
Mapped address: 89.237.194.XXX:31272
Behavior test: success
Nat behavior: Endpoint Independent Mapping
Filtering test: success
Nat filtering: Address and Port Dependent Filtering

I'm a bit confused by line "Address and Port Dependent Filtering". I believe other side had the same NAT behavior. Is c-toxcore supposed to handle that case?

@emdee-is peer frequently changes. Last time I experimented it was 167.88.125.118. looks like it's 7a:60:98:b5:90:bd:c7:3f:97:23:fc:59:f8:2b:3f:90:85:a6:4d:1b:21:3a:af:8e:61:0f:d3:51:93:0d:05:2d node. What bothers me is that it's on the opposite side of Globe, so sound (and especially video) quality were not good.

emdee-is commented 1 year ago

That IPv4 address is a known Tox server - abilinski.com, so if it's being routed there, it would make sense that it's because Tox decided to go there. What port was it on? So I assume your cone is not as full as your thought it was.

Depending on your client you should have a file in json in the directory where your profiles are called something like DHTnodes.json - look in there for the most common Tox servers and ports that Tox will likely use. I don't know the routing algo but there may be capacity considerations that it make it's decisions on.

I assume you are in Russia, and there are there are not many well known servers in Russia; I don't know the Tox routing algorithms , but Tox may have decided that on the opposite side of Globe was your nearest neighbor from its point of view. AFAIK there's nothing in Tox that allows the client to force a routing.

If you know anyone with well connected servers close to you it would be an idea to get them to run a BS node on their servers and add their IP:port info yourself to the DHTnodes.json file, and trim the nodes file down to ones you are OK with. That's the only control a client has that I know of, and it would only take 2 or 3 to influence things.

@zoff99 @nurupo if 2 people ran BS servers that had only each other's nodes listed, and they had 2 clients with json files that only each other's node, would the 2 connect to each other only? Has anyone ever tried to run Tox over a p-to-p network link? A wiki page on routing would be great reading.

akamaus commented 1 year ago

@emdee-is endpoint is 167.88.125.118:33445 Maybe the problem is not NAT not being full-cone, but some filtering logic for incoming packets? I thought with "address and port dependent filtering" packets should pass as soon as both parties sent something to each other endpoint. Am I wrong? Are sections 2.6 and 3.3.1 of https://www.ietf.org/rfc/rfc5128.txt being relevant here?

@nurupo both parties had udp switched on in qtox and nobody used any proxy. How to activate logging btw? Should I recompile toxcore with some flags?

What other experiments could I arrange to investigate the issue?

Btw, does routing algo takes into account something about endpoint IPs, like physical locations or p2p-latency?

emdee-is commented 1 year ago

Any port up from 33445 is a Tox known port.

How to activate logging btw? Should I recompile toxcore with some flags?

Look in the CMakeLists.txt and INSTALL.md: for trace logging you want to just rebuild with


cmake \
  -D MIN_LOGGER_LEVEL=TRACE \
..

and if you like:

  -D CMAKE_BUILD_TYPE=Debug \

I run Debug routinely but I don't know how it affects video performance.

You get lots of logging which should help.

To me, Tox is operating normally, and your cone is not as full as your thought it was, although I don't know the routing algos for why it decided not to use a direct connection.

gjedeer commented 1 year ago

I've regularly ran into this with tuntox, even with one party on an unfiltered public IP (VPS) and another behind NAT. That's a vicious problem to debug because it's occurring so randomly, every time I build toxcore with debugging or some changes I made, it goes away.

One thing that often works for me is just waiting, and after a few minutes toxcore switches from relayed to direct UDP connection.

akamaus commented 1 year ago

@gjedeer yeah. It happens more or less randomly and worst thing there is no UI indicators to understand what's going on. But it has tremendous influence on link quality for me. Either you have a direct routing to your local peer and 5-10 ms latency or some detour route through half of a globe and much worse link quality.

Green-Sky commented 1 year ago

the api exposes the state of the connection. so beside it being wonky, it's an UI issue.