WheezyE / Winelink

Installation scripts for running Winlink (RMS Express/Trimode & VARA) on non-Windows computers. Wine & Box86 make this project possible.
67 stars 18 forks source link

Linux can't connect to ports for 60s after VARA is closed/re-opened (issue with 'TIME_WAIT state' in Linux & VARA's port code implementation) #52

Open WheezyE opened 1 year ago

WheezyE commented 1 year ago

This problem affects Linux/Wine, but does not occur on Windows. Fixing this for Linux would make VARA much more usable for users who do not have Windows.

WheezyE commented 1 year ago

Thank you to KM4ACK & WH6AZ (of iOS RadioMail) for finding the root cause of this issue!

Just consolidating some notes & tests here on this github ticket so we can work in the open.

The VARA TCP re-connection problem

VARA-Wine-TIME_WAIT

  1. Linux TCP ports enter a TIME_WAIT state after the last connection on them is terminated. Apparently to prevent DoS attacks & also to prevent packet loss in some edge cases (references: 1, 2, 3)
  2. The state of each TCP port can be viewed with netstat | grep tcp. VARA's localhost:8300/localhost:8301 ports enter ESTABLISHED state when an RMS Express VARA HF P2P session is first opened. Then, as soon as VARA HF is closed, the 8300/8301 ports enter TIME_WAIT state for Linux OS's. I timed how long the ports stay in TIME_WAIT to be about 60 seconds. Then the ports disappear from netstat (they close & can be re-used again).
  3. While ports 8300/8301 are in TIME_WAIT, VARA HF will currently not attempt to connect to them. We believe that this is the cause of our issue. Said another way: If VARA is run, then VARA is closed, then VARA is run again, VARA will not re-establish a connection to RMS Express (or any other controller program over TCP) within this 60-second window.
Click to expand: Linux port states while VARA HF is connected, directly after closing VARA HF, and 60s after that. ```console pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 localhost:53971 localhost:8301 ESTABLISHED tcp 0 0 raspberrypi4:45066 ec2-3-208-217-166:https ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED tcp 0 0 localhost:8300 localhost:46719 ESTABLISHED tcp 0 0 localhost:8301 localhost:53971 ESTABLISHED tcp 0 0 localhost:46719 localhost:8300 ESTABLISHED Thu 13 Oct 14:34:41 MDT 2022 pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 raspberrypi4:45066 ec2-3-208-217-166:https ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED tcp 0 0 localhost:8300 localhost:46719 TIME_WAIT tcp 0 0 localhost:8301 localhost:53971 TIME_WAIT Thu 13 Oct 14:34:47 MDT 2022 ... pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 raspberrypi4:45066 ec2-3-208-217-166:https ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED tcp 0 0 localhost:8300 localhost:46719 TIME_WAIT tcp 0 0 localhost:8301 localhost:53971 TIME_WAIT Thu 13 Oct 14:35:44 MDT 2022 pi@raspberrypi4:/etc/init.d $ netstat | grep tcp && date tcp 0 0 localhost:51220 localhost:36759 ESTABLISHED tcp 0 0 raspberrypi4:47374 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:36759 localhost:51220 ESTABLISHED tcp 0 0 raspberrypi4:60810 165.254.191.199:https ESTABLISHED tcp 0 0 localhost:56953 localhost:59036 ESTABLISHED tcp 0 0 localhost:59036 localhost:56953 ESTABLISHED Thu 13 Oct 14:35:48 MDT 2022 pi@raspberrypi4:/etc/init.d $ ```

Things we've tried

  1. The Linux kernel has some network variables that we could (in theory) change to help us fix/diagnose this issue (1). Note: Some are integer vars, some are boolean vars.
  2. Setting net.ipv4.tcp_keepalive_time , net.ipv4.tcp_fin_timeout , & sunrpc.tcp_fin_timeout to int 1 doesn't seem to change anything(?) with/without network reset (TIME_WAIT still stays for 60s). Doing this also wrecks the internet on the Pi.
  3. Setting net.ipv4.tcp_tw_reuse to int 1 (global enable) doesn't change any behavior either.
  4. Some forums suggest forcibly cutting the connection on the TCP port with a program like Killcx. However, this does not address our TIME_WAIT problem, which is a port "busy" state that arises AFTER the port's connection is cut.
Click to expand: Viewing/editing/updating(?) Linux kernel variables in the terminal. ```bash # view kernel variables sysctl -A -r tcp # show tcp variables sysctl -A -r tw # show some time_wait variables # view current state of individual variables sysctl net.ipv4.tcp_keepalive_time sysctl net.ipv4.tcp_fin_timeout sysctl net.ipv4.tcp_tw_reuse #sysctl net.ipv4.tcp_tw_recycle #not available in RPiOS I think # edit individual variables - these changes do not persist after reboot sudo sysctl -w net.ipv4.tcp_keepalive_time=1 sudo sysctl -w net.ipv4.tcp_fin_timeout=1 sudo sysctl -w sunrpc.tcp_fin_timeout=1 sudo sysctl -w net.ipv4.tcp_tw_reuse=1 # can also view/edit state of individual variables like this sudo nano /proc/sys/net/ipv4/tcp_fin_timeout # reload network (needed for new variables to take effect?) sudo service networking restart #could also run: /etc/init.d/networking restart #reload variables (from config files?) sudo sysctl --system ```
Click to expand: Killcx RPiOS installation (fyi. doesn't help) ```console #Killcx only detects VARA connection while it's active, does not disable TIME_WAIT state on ports cd ~/Downloads wget https://cfhcable.dl.sourceforge.net/project/killcx/killcx/1.0.3/killcx-1.0.3.tgz 7z x killcx-1.0.3.tgz sudo apt-get install libnet-rawip-perl libnet-pcap-perl libnetpacket-perl sudo chmod +x killcx-1.0.3/killcx cat /etc/hosts # to confirm that localhost is 127.0.0.1 sudo killcx-1.0.3/./killcx 127.0.0.1:8300 tcp ```

Possible solutions

  1. Kindly ask VARA's dev, EA5HVK, if he would be able to make VARA's TCP/ports/sockets connection routine ignore a TIME_WAIT state on a port and connect anyway _(similar to the C function "SO_REUSEADDR") (1,2)_
  2. Create some sort of wrapper that runs instead of VARA which includes an "SOREUSEADDR"-type function, configures VARA's port to be different for each run, and passes traffic to/from VARA-apps? (This would take a lot of work and might be buggy. I wouldn't even know where to begin to make something like this although I think it's theoretically possible)._
  3. Make a script that monitors port states on Linux and warns users that VARA cannot be run during a 60s countdown window if ports 8300/8301 are found to be in the TIME_WAIT state. (KM4ACK's idea - he also has a prototype script written to do this).
  4. Make a daemon script that monitors for VARA in the background at all times: When VARA is run, log PID and wait for VARA to close. Upon VARA closing, reset the network with sudo service networking restart. (This is not a favorable option since it could cause users to lose internet connection / data unexpectedly).
WheezyE commented 1 year ago

I'm going to try the Possible Solution 2 (above): VARA-bridge-Linux for TCP connections, which was also recently suggested by EA5HVK after contacting him.

I'll start trying to write a bridge app in VB6 to see if I can circumvent the TIME_WAIT condition. If that succeeds, I'll see if sending source code to EA5HVK might help implement it in VARA. If that's not an option, then I'll see if I can complete the bridge app.

WheezyE commented 1 year ago

Updating this thread:

SpudGunMan commented 1 year ago

random ideas random words, fine to ignore as I haven't done a lot of data gathering to really give any, let alone that pe1rrr level of data!

I can connect and disconnect a lot with no issues it seems TCP connect projects like CHAT (with vara) -like a lot I cant replicate this error per-say but I dont use winlink much.

is this only .. winlink related and I saw possibly KISS connected phone app as well, (ouch just paid for it to debug this more myself: it will be extra handy platform to use since it focuses on vara in wine tcp kiss only really keeping it simple for this thread)

is this a function of a winlink specific clog? like the layer 6-7 needs looked at? with winlink and vara in tcpdump? to find any strange collisions? I was going to try and sniff how my dev box is not impacted (I am on 5.10 still) any all this rambling to hopefully help and say .. is this a winlink only issue? or any TCP applications? VarAC issues? need more eyes on problem for more data to make this go away. I have not looked at the provided links for solutions in detail yet to see if I am fully foolish in saying any of this but .. just sayin I did see network issues once and they did go away for me. I will get more data as time allows on the matter. love to hear more gonna dig into pe1rrr links as soon as possible. :) 73 hows the general license ;)

WheezyE commented 1 year ago

Your ideas are always welcome! 😃 And thank you for being so interested and wanting to do so much testing.

I can connect and disconnect a lot with no issues it seems TCP connect projects like CHAT (with vara) -like a lot I cant replicate this error per-say but I dont use winlink much.

Over-the-air/radio-signal VARA connections/disconnections should work fine. However, since Linux TCP ports enter a temporary "TIME_WAIT" state after a program closes one of the ports, this usually causes an issue for VARA if any program closes VARA and then re-opens it (like RMS Express), or external programs that try to re-connect to VARA's TCP/IP ports over local/wifi connections (like RadioMail for iPhone).

is this a winlink only issue? or any TCP applications? VarAC issues? It's an issue with VARA - specifically, how VARA has been programmed to deal with TCP port reconnections and TIME_WAIT stuff. I believe that there is a way to work around this in VB6 (the language VARA is programmed in), but I'm not a programmer and also don't have access to VARA's source code to test anything.

To be honest, I'm more interested in the wine/box86/emulation side of things and don't really use or test VARA much otherwise. Last I knew, these issues weren't fixed, but it's possible maybe Jose ended up patching this in. I haven't tested it in a while, but I think pe1rrr would know more since he's tested it more recently.

WheezyE commented 1 year ago

This is all as far as I know... again, pe1rrr has more first-hand experience with the problem and the ways it impacts users. (@pe1rrr, feel free to correct any info I got wrong here)

pe1rrr commented 1 year ago

This is all as far as I know... again, pe1rrr has more first-hand experience with the problem and the ways it impacts users. (@pe1rrr, feel free to correct any info I got wrong here)

👍 So far so good.

georges commented 1 year ago

Great summary of an otherwise unfortunate issue. For what it's worth, this problem also occurs with CrossOver on macOS as well.

georges commented 6 months ago

FYI, anybody looking for a workaround for this, I've created varanny, a launcher for VARA. Amongst other things it helps start/stop VARA instance remotely and also can manage VARA.ini files to allow for multiple configuration to co-exist. It also takes care of service discovery by advertising VARA as DNS-SD. RadioMail has support for this since v 1.3.

https://github.com/islandmagic/varanny