Ysurac / openmptcprouter

OpenMPTCProuter is an open source solution to aggregate multiple internet connections using Multipath TCP (MPTCP) on OpenWrt
https://www.openmptcprouter.com/
GNU General Public License v3.0
1.85k stars 266 forks source link

Intermittent 10s Delay in Connection Setup Despite Full Bandwidth #3570

Open hubix2000 opened 1 month ago

hubix2000 commented 1 month ago

Expected Behavior

We expect connections to establish quickly without any delay, even when there are many simultaneous sessions. The connection setup time should be near-instantaneous and reliable.

Current Behavior

Sometimes, the connection setup takes exactly 10 seconds, and occasionally, no session is established at all. This delay seems to happen more often when there are many simultaneous sessions. Despite this, speed tests consistently show full bandwidth. We have checked the logs in Status -> System log but couldn't find a clear explanation for the behavior.

Specifications

Ysurac commented 1 month ago

Did you check if it's a DNS issue or same when using IP ? Can you give me the result of uci show unbound.ub_main and uci -q get openmptcprouter.settings.disable_ipv6 ?

hubix2000 commented 1 month ago

Hi Ysurac. It's not an dns issue. The 10s delay is always before dns resolution.

It's happening time by time. But the result is, that it feels very slow. We have a lot of users using the connection.

root@OpenMPTCProuter:~# uci show unbound.ub_main unbound.ub_main=unbound unbound.ub_main.dhcp_link='dnsmasq' unbound.ub_main.dns64='0' unbound.ub_main.domain='lan' unbound.ub_main.edns_size='1232' unbound.ub_main.extended_stats='0' unbound.ub_main.hide_binddata='1' unbound.ub_main.interface_auto='0' unbound.ub_main.listen_port='5353' unbound.ub_main.localservice='1' unbound.ub_main.manual_conf='0' unbound.ub_main.num_threads='1' unbound.ub_main.protocol='ip4_only' unbound.ub_main.rate_limit='0' unbound.ub_main.rebind_localhost='0' unbound.ub_main.rebind_protection='1' unbound.ub_main.recursion='aggressive' unbound.ub_main.resource='default' unbound.ub_main.root_age='9' unbound.ub_main.ttl_min='120' unbound.ub_main.ttl_neg_max='1000' unbound.ub_main.unbound_control='0' unbound.ub_main.validator='1' unbound.ub_main.validator_ntp='1' unbound.ub_main.verbosity='1' unbound.ub_main.iface_trig='lan' 'wan' unbound.ub_main.enabled='1' unbound.ub_main.interface='loopback'

root@OpenMPTCProuter:~# uci -q get openmptcprouter.settings.disable_ipv6 1

hubix2000 commented 1 month ago

Screenshot 2024-10-04 110010

Attached you'll find a screenshot with typically timings after some time. Just a restart helps. :-(

Ysurac commented 1 month ago

In status->overview, how many Active Connections do you have ? What is the proxy used ? (default is now Shadowsocks-Rust)

hubix2000 commented 1 month ago

We use Shadowsocks-Rust. We have 9090 / 131072 (6%) active connections.

Ysurac commented 1 month ago

When you have an issue, can you check on the VPS via journalctl -u shadowsocks-go what is the log for the website IP you want to reach ?

hubix2000 commented 1 month ago

I just attached the log file. vpsadmin@CoreVision-VPS02 ~.txt

hubix2000 commented 1 month ago

Hi Ysurac, did you forget to check my logs?

Ysurac commented 1 month ago

There is many "Failed to complete handshake with client". What are the connections type ? FTTH, mobile, sat,... ? What is the result of ss --summary on the VPS ?

hubix2000 commented 1 month ago

it's dsl + starlink.

Ysurac commented 1 month ago

What is the connection set as master ? In your case this should be DSL I think.

hubix2000 commented 1 month ago

It is.

hubix2000 commented 1 month ago

Hi. Somehow the issue can be related to shadowsocks-rust timeout handling. This 10s can be related to SYN. I added the keap_alive option to the shadowsocks-rust config and the behavior changed a bit. Ist's not fixed yet, but better.

Any idea?

Ysurac commented 4 weeks ago

I will add keep_alive setting support in next commit. What is the result of ss --summary on the VPS ?

hubix2000 commented 4 weeks ago

Hi,

It appears that our issue has been resolved following the move to Ionos. While the exact cause remains unclear, the timeout seems to be related to the Azure VM. This requires further investigation.

We will submit a support request to Microsoft.