JeffersonLab / wildfly

Configurable Wildfly base Docker image and setup scripts
MIT License
0 stars 0 forks source link

WSL2 + JLab Network = phantom network timeouts #4

Closed slominskir closed 10 months ago

slominskir commented 10 months ago

Got a new Windows 11 PC with latest everything and the Wildfly bash setup scripts no longer work properly in WSL2. Specifically network requests periodically timeout.

Related:

This was working before on Windows 10 Enterprise (JLab) and Windows 11 Home (Personal), but likely with older Ubuntu distros and possibly older versions of WSL2 or at least possibly different install methods (app store version differs apparently?).

Fully patched Windows 11 Installed on 11/14/2023 with Windows 11 Enterprise Version 22H2 build 22621.2506.

Fully patched WSL2:

PS C:\Users\ryans> wsl.exe --status
Default Distribution: Ubuntu-22.04
Default Version: 2
PS C:\Users\ryans> wsl.exe --version
WSL version: 2.0.9.0
Kernel version: 5.15.133.1-1
WSLg version: 1.0.59
MSRDC version: 1.2.4677
Direct3D version: 1.611.1-81528511
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.2506

I ended up just letting the server-setup.sh script run all afternoon and it eventually did complete. The output just contains a bunch of seemingly random timeouts followed by retries (took an hour or two when it should have run in less than a minute):

ryans@SFTRYANS:/mnt/c/users/ryans/servers/setup$ ./server-setup.sh server.env config_provided
Loading environment server.env
------------------------
config_provided
------------------------
Using env file: server.env
Loading environment server.env
------------------------
add_modules
------------------------
local|org.apache.poi|https://repo1.maven.org/maven2/org/apache/poi/poi/5.2.3/poi-5.2.3.jar,https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/5.2.3/poi-ooxml-5.2.3.jar,https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-lite/5.2.3/poi-ooxml-lite-5.2.3.jar,https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/5.1.1/xmlbeans-5.1.1.jar,https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.6.1/commons-math3-3.6.1.jar,https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.22/commons-compress-1.22.jar,https://repo1.maven.org/maven2/com/zaxxer/SparseBitSet/1.2/SparseBitSet-1.2.jar,https://repo1.maven.org/maven2/org/apache/commons/commons-collections4/4.4/commons-collections4-4.4.jar|javaee.api,org.jboss.as.web,org.apache.commons.io,org.apache.commons.codec,org.apache.logging.log4j.api
SCOPE: local
DEP_NAME: org.apache.poi
RESOURCES_CSV: https://repo1.maven.org/maven2/org/apache/poi/poi/5.2.3/poi-5.2.3.jar,https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/5.2.3/poi-ooxml-5.2.3.jar,https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-lite/5.2.3/poi-ooxml-lite-5.2.3.jar,https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/5.1.1/xmlbeans-5.1.1.jar,https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.6.1/commons-math3-3.6.1.jar,https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.22/commons-compress-1.22.jar,https://repo1.maven.org/maven2/com/zaxxer/SparseBitSet/1.2/SparseBitSet-1.2.jar,https://repo1.maven.org/maven2/org/apache/commons/commons-collections4/4.4/commons-collections4-4.4.jar
DEPENDENCIES_CSV: javaee.api,org.jboss.as.web,org.apache.commons.io,org.apache.commons.codec,org.apache.logging.log4j.api
add_module
> [https://repo1.maven.org/maven2/org/apache/poi/poi/5.2.3/poi-5.2.3.jar]
2023-11-20 13:22:42 URL:https://repo1.maven.org/maven2/org/apache/poi/poi/5.2.3/poi-5.2.3.jar [2964641/2964641] -> "poi-5.2.3.jar.3" [1]
done with wget
> [https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/5.2.3/poi-ooxml-5.2.3.jar]
2023-11-20 13:22:45 URL:https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml/5.2.3/poi-ooxml-5.2.3.jar [2010497/2010497] -> "poi-ooxml-5.2.3.jar.1" [1]
done with wget
> [https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-lite/5.2.3/poi-ooxml-lite-5.2.3.jar]
2023-11-20 13:22:56 URL:https://repo1.maven.org/maven2/org/apache/poi/poi-ooxml-lite/5.2.3/poi-ooxml-lite-5.2.3.jar [5898622/5898622] -> "poi-ooxml-lite-5.2.3.jar.1" [1]
done with wget
> [https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/5.1.1/xmlbeans-5.1.1.jar]
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
2023-11-20 13:42:58 URL:https://repo1.maven.org/maven2/org/apache/xmlbeans/xmlbeans/5.1.1/xmlbeans-5.1.1.jar [2196526/2196526] -> "xmlbeans-5.1.1.jar" [1]
done with wget
> [https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.6.1/commons-math3-3.6.1.jar]
2023-11-20 13:43:03 URL:https://repo1.maven.org/maven2/org/apache/commons/commons-math3/3.6.1/commons-math3-3.6.1.jar [2213560/2213560] -> "commons-math3-3.6.1.jar" [1]
done with wget
> [https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.22/commons-compress-1.22.jar]
2023-11-20 13:43:04 URL:https://repo1.maven.org/maven2/org/apache/commons/commons-compress/1.22/commons-compress-1.22.jar [1039712/1039712] -> "commons-compress-1.22.jar" [1]
done with wget
> [https://repo1.maven.org/maven2/com/zaxxer/SparseBitSet/1.2/SparseBitSet-1.2.jar]
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
2023-11-20 14:03:01 URL:https://repo1.maven.org/maven2/com/zaxxer/SparseBitSet/1.2/SparseBitSet-1.2.jar [24510/24510] -> "SparseBitSet-1.2.jar" [1]
done with wget
> [https://repo1.maven.org/maven2/org/apache/commons/commons-collections4/4.4/commons-collections4-4.4.jar]
2023-11-20 14:03:02 URL:https://repo1.maven.org/maven2/org/apache/commons/commons-collections4/4.4/commons-collections4-4.4.jar [751914/751914] -> "commons-collections4-4.4.jar" [1]
done with wget
pendencies=javaee.api,org.jboss.as.web,org.apache.commons.io,org.apache.commons.codec,org.apache.logging.log4j.api1.2.jar,/tmp/commons-collections4-4.4.jar --de
[standalone@localhost:9990 /] global|org.tuckey.urlrewritefilter|https://repo1.maven.org/maven2/org/tuckey/urlrewritefilter/4.0.4/urlrewritefilter-4.0.4.jar|javaee.api,org.jboss.as.web
SCOPE: global
DEP_NAME: org.tuckey.urlrewritefilter
RESOURCES_CSV: https://repo1.maven.org/maven2/org/tuckey/urlrewritefilter/4.0.4/urlrewritefilter-4.0.4.jar
DEPENDENCIES_CSV: javaee.api,org.jboss.as.web
add_module
> [https://repo1.maven.org/maven2/org/tuckey/urlrewritefilter/4.0.4/urlrewritefilter-4.0.4.jar]
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
2023-11-20 14:23:04 URL:https://repo1.maven.org/maven2/org/tuckey/urlrewritefilter/4.0.4/urlrewritefilter-4.0.4.jar [177474/177474] -> "urlrewritefilter-4.0.4.jar" [1]
done with wget
vaee.api,org.jboss.as.web0 /] module add --name=org.tuckey.urlrewritefilter --resource-delimiter=, --resources=/tmp/urlrewritefilter-4.0.4.jar --dependencies=ja
[standalone@localhost:9990 /] {"outcome" => "success"}
global|org.jlab.jlog|https://repo1.maven.org/maven2/org/jlab/jlog/5.0.0/jlog-5.0.0.jar|javaee.api,org.jboss.as.web
SCOPE: global
DEP_NAME: org.jlab.jlog
RESOURCES_CSV: https://repo1.maven.org/maven2/org/jlab/jlog/5.0.0/jlog-5.0.0.jar
DEPENDENCIES_CSV: javaee.api,org.jboss.as.web
add_module
> [https://repo1.maven.org/maven2/org/jlab/jlog/5.0.0/jlog-5.0.0.jar]
2023-11-20 14:23:11 URL:https://repo1.maven.org/maven2/org/jlab/jlog/5.0.0/jlog-5.0.0.jar [51354/51354] -> "jlog-5.0.0.jar" [1]
done with wget
[standalone@localhost:9990 /] module add --name=org.jlab.jlog --resource-delimiter=, --resources=/tmp/jlog-5.0.0.jar --dependencies=javaee.api,org.jboss.as.web
[standalone@localhost:9990 /] {"outcome" => "success"}
global|org.keycloak.admin-client|https://repo1.maven.org/maven2/org/keycloak/keycloak-admin-client/20.0.5/keycloak-admin-client-20.0.5.jar,https://repo1.maven.org/maven2/org/keycloak/keycloak-core/20.0.5/keycloak-core-20.0.5.jar,https://repo1.maven.org/maven2/org/keycloak/keycloak-common/20.0.5/keycloak-common-20.0.5.jar|org.jboss.ws.api,javax.ws.rs.api,org.jboss.logging,org.jboss.resteasy.resteasy-client,org.jboss.resteasy.resteasy-jackson2-provider,org.jboss.resteasy.resteasy-jaxb-provider,org.jboss.resteasy.resteasy-multipart-provider
SCOPE: global
DEP_NAME: org.keycloak.admin-client
RESOURCES_CSV: https://repo1.maven.org/maven2/org/keycloak/keycloak-admin-client/20.0.5/keycloak-admin-client-20.0.5.jar,https://repo1.maven.org/maven2/org/keycloak/keycloak-core/20.0.5/keycloak-core-20.0.5.jar,https://repo1.maven.org/maven2/org/keycloak/keycloak-common/20.0.5/keycloak-common-20.0.5.jar
DEPENDENCIES_CSV: org.jboss.ws.api,javax.ws.rs.api,org.jboss.logging,org.jboss.resteasy.resteasy-client,org.jboss.resteasy.resteasy-jackson2-provider,org.jboss.resteasy.resteasy-jaxb-provider,org.jboss.resteasy.resteasy-multipart-provider
add_module
> [https://repo1.maven.org/maven2/org/keycloak/keycloak-admin-client/20.0.5/keycloak-admin-client-20.0.5.jar]
2023-11-20 14:23:18 URL:https://repo1.maven.org/maven2/org/keycloak/keycloak-admin-client/20.0.5/keycloak-admin-client-20.0.5.jar [64674/64674] -> "keycloak-admin-client-20.0.5.jar" [1]
done with wget
> [https://repo1.maven.org/maven2/org/keycloak/keycloak-core/20.0.5/keycloak-core-20.0.5.jar]
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
failed: Connection timed out.
2023-11-20 14:43:13 URL:https://repo1.maven.org/maven2/org/keycloak/keycloak-core/20.0.5/keycloak-core-20.0.5.jar [330299/330299] -> "keycloak-core-20.0.5.jar" [1]
done with wget
> [https://repo1.maven.org/maven2/org/keycloak/keycloak-common/20.0.5/keycloak-common-20.0.5.jar]
2023-11-20 14:43:13 URL:https://repo1.maven.org/maven2/org/keycloak/keycloak-common/20.0.5/keycloak-common-20.0.5.jar [162927/162927] -> "keycloak-common-20.0.5.jar" [1]
done with wget
resteasy.resteasy-jackson2-provider,org.jboss.resteasy.resteasy-jaxb-provider,org.jboss.resteasy.resteasy-multipart-provider.resteasy.resteasy-client,org.jboss.
[standalone@localhost:9990 /] {"outcome" => "success"}
slominskir commented 10 months ago

Worth pointing out that if I use PowerShell to run curl or wget to grab the files from maven central the files are downloaded nearly instantly. We do have an upstream intercepting TLS proxy server at JLab that may be interacting in odd ways with the WSL networking. In order for wget to work inside WSL I had to add our internal PKI cert plus enable legacy renegotiation:

sudo wget -O /usr/local/share/ca-certificates/customcert.crt http://pki.jlab.org/JLabCA.crt
sudo update-ca-certificates
sudo cat "Options = UnsafeLegacyRenegotiation" >> /etc/ssl/openssl.cnf
slominskir commented 10 months ago

Tried a few of the suggested fixes from related issues mentioned above. Syncing the hardware clock didn't do anything. Disabling IP6 at the Windows adapter and inside WSL by making IP4 higher precedence didn't do anything. Rebooting the machine didn't fix it. Next thing I tried was overriding the DNS server. Many suggest using Google's 8.8.8.8, but that doesn't work, presumably because our intercepting proxy blocks it, Commenting out the WSL2 configured DNS server and setting it to one of our internal DNS servers appears to fix the issue:

ryans@SFTRYANS:/etc$ cat resolv.conf
# This file was automatically generated by WSL. To stop automatic generation of this file, add the following entry to /etc/wsl.conf:
# [network]
# generateResolvConf = false
#nameserver 192.168.208.1
nameserver 129.57.90.255

Note: Must add generateResolvConf=false to /etc/wsl.conf in order for resolv.conf changes to survive a reboot. Make sure resolv.conf is no longer a symlink too (unlink and create new)

slominskir commented 10 months ago

Note: I still have no idea WHY explicitly defining the DNS server worked. Some notes to follow up with:

slominskir commented 10 months ago

Also worth pointing out there are a few other firewalls/anti-virus apps on the PC besides the HyperV one mentioned above that I have to work around as well (and are probably stepping all over each-other too):

The first two are new to the new machine so that could be a clue.

slominskir commented 10 months ago

There is a broader issue affecting Docker Desktop containers as well and the WSL2 Ubutunu distribution fix mentioned above only fixes it for the WSL2 Ubuntu distribution, not for Docker Desktop containers. Specifically docker compose up on a compose file of containers that need to communicate with each other fail to connect to each other (compose example). So I'll re-open. A few more notes:

slominskir commented 10 months ago

After a Docker Desktop crash followed by Factory Reset now compose appears to be working fine. Not very satisfying. So I tried to uninstall everything, reboot, and then re-install everything again, and reboot. This means uninstalling Docker Desktop, Ubuntu, and WSL2. Turns out there are two instances of WSL2 and on re-install 2 are put back. Weird. Screenshot:

ScreenshotA

I re-installed using the directions here: https://learn.microsoft.com/en-us/windows/wsl/install, which means simply running

wsl --install

It's confusing that the Microsoft Store could be used for this as well with possibly different outcome and also strangely the store lists two different Ubuntu apps, and even the one with implicit version actually uses the same version as the one with explicit version (22.0.4.2). Weird. Screenshot:

ScreenshotE

Just to be safe I also unchecked the WSL "Windows Feature" during uninstall and confirmed it's re-checked (enabled) after re-installing with wsl --install command. The connection between app and feature is unclear too. Weird. Screenshot:

ScreenshotC

I can confirm that the previous behavior inside WSL2 Ubuntu returns, in that wget is unreliable again. Screenshot:

ScreenshotB ScreenshotF

Docker Desktop re-installed without a hitch and now works without a problem so far. I haven't found how to get it back into the odd state it was in before. I guess I just keep using it until it breaks again. Might break once I attempt to fix phantom network error in WSL.

slominskir commented 10 months ago

Ran wireshark packet capture with wget when it works vs when it results in timeout. Looks like when it works DNS selects IP6 address:

good

When it doesn't work, IP4 addresses are returned and a .com root DNS server is selected oddly (I'm not sure how to interpret this):

bad
slominskir commented 10 months ago

I think my initial interpretation of the packet capture data is misleading.

After doing some reading it appears that in both working and timeout scenarios a list of answers are returned for both IP4 and for IP6 and it appears the lists are identical, but the difference is that order the lists are returned differs. In the working case IP4 answers are returned first (A records) whereas in the timeout case IP6 answers are returned first (AAAA records). This ordering could be a coincidence and perhaps is not important. What really matters is that the chosen IP to use differs in the working vs timeout cases. It isn't clear how the choice is made.

slominskir commented 10 months ago

It's also interesting that if I repeat the wireshark test on my personal Windows 11 Home PC I notice the DNS answers don't include root DNS servers in results:

Screenshot

The answers are only actual GitHub domain results as you'd expect. The fact that the onsite PC results include seemingly spurious root DNS answers appears to be the issue. The onsite test also shows inserting/mixing A records (the root DNS ones) in an AAAA response, which appears odd.

slominskir commented 10 months ago

I guess I'm seeing this issue: https://github.com/microsoft/WSL/issues/5806

DNS lookup response is erroneously mixing AUTHORITY response in ANSWERS section:

ryans@SFTRYANS:/mnt/c/Users/ryans$ dig github.com

; <<>> DiG 9.18.18-0ubuntu0.22.04.1-Ubuntu <<>> github.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 11848
;; flags: qr rd ad; QUERY: 1, ANSWER: 15, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;github.com.                    IN      A

;; ANSWER SECTION:
github.com.             0       IN      A       140.82.114.3
b.gtld-servers.net.     0       IN      A       192.33.14.30
e.gtld-servers.net.     0       IN      A       192.12.94.30
i.gtld-servers.net.     0       IN      A       192.43.172.30
k.gtld-servers.net.     0       IN      A       192.52.178.30
f.gtld-servers.net.     0       IN      A       192.35.51.30
h.gtld-servers.net.     0       IN      A       192.54.112.30
c.gtld-servers.net.     0       IN      A       192.26.92.30
a.gtld-servers.net.     0       IN      A       192.5.6.30
j.gtld-servers.net.     0       IN      A       192.48.79.30
g.gtld-servers.net.     0       IN      A       192.42.93.30
l.gtld-servers.net.     0       IN      A       192.41.162.30
d.gtld-servers.net.     0       IN      A       192.31.80.30
m.gtld-servers.net.     0       IN      A       192.55.83.30
b.gtld-servers.net.     0       IN      AAAA    2001:503:231d::2:30
slominskir commented 10 months ago

This of course only creates more questions:

If/when Windows 12 comes out someone else can migrate over first!

slominskir commented 10 months ago

Moving forward with /etc/resolv.conf and /etc/wsl.conf config change as done before. Sounds like in the near future the experimental dnsTunneling feature will be the correct fix. Re-closing. I'll create a new issue if I'm able to pinpoint odd behavior with Docker Desktop - it seems that may be something unrelated and appears to be gone at the moment.

karolswdev commented 4 months ago

@slominskir I read through your updates and appreciate the level of detail that you included here. Have you had a chance to come back to this issue and explore the dnsTunneling solution you were referring to?

slominskir commented 4 months ago

@karolswdev - Nope, I'm still relying on explicit configuration / override to one of our corporate domain DNS servers. It does look like dnsTunneling is about to be the default mode of operation though as there is a pre-release stating as much, so presumably soon new users will never see the dark corner of WSL discussed here.