WheezyE / Winelink

Installation scripts for running Winlink (RMS Express/Trimode & VARA) on non-Windows computers. Wine & Box86 make this project possible.
77 stars 19 forks source link

Linux can't connect to ports for 60s after VARA is closed/re-opened (issue with 'TIME_WAIT state' in Linux & VARA's port code implementation) #52

Open WheezyE opened 2 years ago

WheezyE commented 2 years ago

This problem affects Linux/Wine, but does not occur on Windows. Fixing this for Linux would make VARA much more usable for users who do not have Windows.

WheezyE commented 2 years ago

Thank you to KM4ACK & WH6AZ (of iOS RadioMail) for finding the root cause of this issue!

Just consolidating some notes & tests here on this github ticket so we can work in the open.

The VARA TCP re-connection problem

VARA-Wine-TIME_WAIT

  1. Linux TCP ports enter a TIME_WAIT state after the last connection on them is terminated. Apparently to prevent DoS attacks & also to prevent packet loss in some edge cases (references: 1, 2, 3)
  2. The state of each TCP port can be viewed with netstat | grep tcp. VARA's localhost:8300/localhost:8301 ports enter ESTABLISHED state when an RMS Express VARA HF P2P session is first opened. Then, as soon as VARA HF is closed, the 8300/8301 ports enter TIME_WAIT state for Linux OS's. I timed how long the ports stay in TIME_WAIT to be about 60 seconds. Then the ports disappear from netstat (they close & can be re-used again).
  3. While ports 8300/8301 are in TIME_WAIT, VARA HF will currently not attempt to connect to them. We believe that this is the cause of our issue. Said another way: If VARA is run, then VARA is closed, then VARA is run again, VARA will not re-establish a connection to RMS Express (or any other controller program over TCP) within this 60-second window.
Click to expand: Linux port states while VARA HF is connected, directly after closing VARA HF, and 60s after that. ```console pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 localhost:53971 localhost:8301 ESTABLISHED tcp 0 0 raspberrypi4:45066 ec2-3-208-217-166:https ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED tcp 0 0 localhost:8300 localhost:46719 ESTABLISHED tcp 0 0 localhost:8301 localhost:53971 ESTABLISHED tcp 0 0 localhost:46719 localhost:8300 ESTABLISHED Thu 13 Oct 14:34:41 MDT 2022 pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 raspberrypi4:45066 ec2-3-208-217-166:https ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED tcp 0 0 localhost:8300 localhost:46719 TIME_WAIT tcp 0 0 localhost:8301 localhost:53971 TIME_WAIT Thu 13 Oct 14:34:47 MDT 2022 ... pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 raspberrypi4:45066 ec2-3-208-217-166:https ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED tcp 0 0 localhost:8300 localhost:46719 TIME_WAIT tcp 0 0 localhost:8301 localhost:53971 TIME_WAIT Thu 13 Oct 14:35:44 MDT 2022 pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED Thu 13 Oct 14:35:48 MDT 2022 pi@raspberrypi4:/etc/init.d $ ```

Things we've tried

  1. The Linux kernel has some network variables that we could (in theory) change to help us fix/diagnose this issue (1). Note: Some are integer vars, some are boolean vars.
  2. Setting net.ipv4.tcp_keepalive_time , net.ipv4.tcp_fin_timeout , & sunrpc.tcp_fin_timeout to int 1 doesn't seem to change anything(?) with/without network reset (TIME_WAIT still stays for 60s). Doing this also wrecks the internet on the Pi.
  3. Setting net.ipv4.tcp_tw_reuse to int 1 (global enable) doesn't change any behavior either.
  4. Some forums suggest forcibly cutting the connection on the TCP port with a program like Killcx. However, this does not address our TIME_WAIT problem, which is a port "busy" state that arises AFTER the port's connection is cut.
Click to expand: Viewing/editing/updating(?) Linux kernel variables in the terminal. ```bash # view kernel variables sysctl -A -r tcp # show tcp variables sysctl -A -r tw # show some time_wait variables # view current state of individual variables sysctl net.ipv4.tcp_keepalive_time sysctl net.ipv4.tcp_fin_timeout sysctl net.ipv4.tcp_tw_reuse #sysctl net.ipv4.tcp_tw_recycle #not available in RPiOS I think # edit individual variables - these changes do not persist after reboot sudo sysctl -w net.ipv4.tcp_keepalive_time=1 sudo sysctl -w net.ipv4.tcp_fin_timeout=1 sudo sysctl -w sunrpc.tcp_fin_timeout=1 sudo sysctl -w net.ipv4.tcp_tw_reuse=1 # can also view/edit state of individual variables like this sudo nano /proc/sys/net/ipv4/tcp_fin_timeout # reload network (needed for new variables to take effect?) sudo service networking restart #could also run: /etc/init.d/networking restart #reload variables (from config files?) sudo sysctl --system ```
Click to expand: Killcx RPiOS installation (fyi. doesn't help) ```console #Killcx only detects VARA connection while it's active, does not disable TIME_WAIT state on ports cd ~/Downloads wget https://cfhcable.dl.sourceforge.net/project/killcx/killcx/1.0.3/killcx-1.0.3.tgz 7z x killcx-1.0.3.tgz sudo apt-get install libnet-rawip-perl libnet-pcap-perl libnetpacket-perl sudo chmod +x killcx-1.0.3/killcx cat /etc/hosts # to confirm that localhost is 127.0.0.1 sudo killcx-1.0.3/./killcx 127.0.0.1:8300 tcp ```

Possible solutions

  1. Kindly ask VARA's dev, EA5HVK, if he would be able to make VARA's TCP/ports/sockets connection routine ignore a TIME_WAIT state on a port and connect anyway _(similar to the C function "SO_REUSEADDR") (1,2)_
  2. Create some sort of wrapper that runs instead of VARA which includes an "SOREUSEADDR"-type function, configures VARA's port to be different for each run, and passes traffic to/from VARA-apps? (This would take a lot of work and might be buggy. I wouldn't even know where to begin to make something like this although I think it's theoretically possible)._
  3. Make a script that monitors port states on Linux and warns users that VARA cannot be run during a 60s countdown window if ports 8300/8301 are found to be in the TIME_WAIT state. (KM4ACK's idea - he also has a prototype script written to do this).
  4. Make a daemon script that monitors for VARA in the background at all times: When VARA is run, log PID and wait for VARA to close. Upon VARA closing, reset the network with sudo service networking restart. (This is not a favorable option since it could cause users to lose internet connection / data unexpectedly).
WheezyE commented 2 years ago

I'm going to try the Possible Solution 2 (above): VARA-bridge-Linux for TCP connections, which was also recently suggested by EA5HVK after contacting him.

I'll start trying to write a bridge app in VB6 to see if I can circumvent the TIME_WAIT condition. If that succeeds, I'll see if sending source code to EA5HVK might help implement it in VARA. If that's not an option, then I'll see if I can complete the bridge app.

WheezyE commented 1 year ago

Updating this thread:

SpudGunMan commented 1 year ago

random ideas random words, fine to ignore as I haven't done a lot of data gathering to really give any, let alone that pe1rrr level of data!

I can connect and disconnect a lot with no issues it seems TCP connect projects like CHAT (with vara) -like a lot I cant replicate this error per-say but I dont use winlink much.

is this only .. winlink related and I saw possibly KISS connected phone app as well, (ouch just paid for it to debug this more myself: it will be extra handy platform to use since it focuses on vara in wine tcp kiss only really keeping it simple for this thread)

is this a function of a winlink specific clog? like the layer 6-7 needs looked at? with winlink and vara in tcpdump? to find any strange collisions? I was going to try and sniff how my dev box is not impacted (I am on 5.10 still) any all this rambling to hopefully help and say .. is this a winlink only issue? or any TCP applications? VarAC issues? need more eyes on problem for more data to make this go away. I have not looked at the provided links for solutions in detail yet to see if I am fully foolish in saying any of this but .. just sayin I did see network issues once and they did go away for me. I will get more data as time allows on the matter. love to hear more gonna dig into pe1rrr links as soon as possible. :) 73 hows the general license ;)

WheezyE commented 1 year ago

Your ideas are always welcome! 😃 And thank you for being so interested and wanting to do so much testing.

I can connect and disconnect a lot with no issues it seems TCP connect projects like CHAT (with vara) -like a lot I cant replicate this error per-say but I dont use winlink much.

Over-the-air/radio-signal VARA connections/disconnections should work fine. However, since Linux TCP ports enter a temporary "TIME_WAIT" state after a program closes one of the ports, this usually causes an issue for VARA if any program closes VARA and then re-opens it (like RMS Express), or external programs that try to re-connect to VARA's TCP/IP ports over local/wifi connections (like RadioMail for iPhone).

is this a winlink only issue? or any TCP applications? VarAC issues? It's an issue with VARA - specifically, how VARA has been programmed to deal with TCP port reconnections and TIME_WAIT stuff. I believe that there is a way to work around this in VB6 (the language VARA is programmed in), but I'm not a programmer and also don't have access to VARA's source code to test anything.

To be honest, I'm more interested in the wine/box86/emulation side of things and don't really use or test VARA much otherwise. Last I knew, these issues weren't fixed, but it's possible maybe Jose ended up patching this in. I haven't tested it in a while, but I think pe1rrr would know more since he's tested it more recently.

WheezyE commented 1 year ago

This is all as far as I know... again, pe1rrr has more first-hand experience with the problem and the ways it impacts users. (@pe1rrr, feel free to correct any info I got wrong here)

pe1rrr commented 1 year ago

This is all as far as I know... again, pe1rrr has more first-hand experience with the problem and the ways it impacts users. (@pe1rrr, feel free to correct any info I got wrong here)

👍 So far so good.

georges commented 1 year ago

Great summary of an otherwise unfortunate issue. For what it's worth, this problem also occurs with CrossOver on macOS as well.

georges commented 11 months ago

FYI, anybody looking for a workaround for this, I've created varanny, a launcher for VARA. Amongst other things it helps start/stop VARA instance remotely and also can manage VARA.ini files to allow for multiple configuration to co-exist. It also takes care of service discovery by advertising VARA as DNS-SD. RadioMail has support for this since v 1.3.

https://github.com/islandmagic/varanny

WheezyE commented 4 months ago

@georges This is long over-due, but I am very grateful for your incredible work on this. I will look forward to implementing it in the future.

I've been working on moving overseas (to Ireland) this year and it's been keeping me pretty busy. I'm looking forward to being settled in October to hopefully have more time to work on projects again.

Anyways, thank you again for this.