clemenko / rke_airgap_install

a script/method for air gapping the Rancher Stack with Hauler
50 stars 25 forks source link

Default route not set #23

Closed ertimas closed 2 months ago

ertimas commented 2 months ago

Hi Andy,

I am running into a missing default ip route, which the k3s team solves via a dummy in this issue: https://github.com/k3s-io/k3s/issues/1144. image

It seems like you would have run into that? Did you have this configured prior to the script, or did I miss something?

Thanks for your time!

clemenko commented 2 months ago

I have not seen that before. Let's look at the basics. What OS ? cat /etc/os-release What nics ? ip a did you modify the config.yaml?

ertimas commented 2 months ago

Hi Andy, Thank you for getting back to me.

OS

Oracle Linux 8.10 server

NICS

image

Config

I didn't modify config.yaml, though I also didn't have the file in /etc/rancher/rke2/. So I symlinked one to the existing /etc/rancher/rke2/rke2.yaml. Which got through that issue. Though I'm left with a new one....

image

hauler.repo is in fileserver/.
image

Here's the output from netstat image

Any thought on how to troubleshoot fileserver not seeing hauler.repo would be much appreciated. Note: my hauler directory is has global read/write permissions.

clemenko commented 2 months ago

the two issues may be related. A. routes B. hauler.

Can you try a clean install of Oracle 8 and see if any routes are there? /proc/net/route specifically?

ertimas commented 2 months ago

Here's /proc/net/route. Note I've got a second interface attached so that I can get the VM setup before "airgapping" it

image

clemenko commented 2 months ago

I wonder is the error was an isolated issue. Can you try and install it again?

Wait, are you disabling the NIC when installing the stack? Kubernetes needs an NIC at all times.

ertimas commented 2 months ago

Last time I left both NICs on during installation. I've tried it a couple times with each/both NICs, no ip given to hauler_all_the_things.sh, still the same issue

clemenko commented 2 months ago

there is no IP needed for the control function. can you run the control command with bash -x ./hauler_all_the_things.sh control ?

ertimas commented 2 months ago

I found that if I didn't provide an IP then it might look at the second NIC, and it looks like this made it happen.

Here is the result of bash -x ./hauler_all_the_things.sh control

image

Here's the log where createrepo was run image

clemenko commented 2 months ago

Can you see if the hauler.repo file is there : curl -sfL http://10.237.13.41:8080/hauler.repo?

ertimas commented 2 months ago

I can see the file, but it's inaccessible if running the hauler server, see the yellow box. Note: if I run python3 http.server I can access the file from another box, which leads me to believe basic networking and file permissions are fine.

Screenshot 2024-09-17 at 09 24 12

clemenko commented 2 months ago

I wonder is Hauler is not able to server anything on a node with 2 nics. What does ss -tln show?

ertimas commented 2 months ago

It's shows the same thing as netstat -lnt essentially.

Manually running hauler store serve fileserver <my store directory> did work. It took about a minute for it to come up. Here are the debug logs in case they're helpful. Marking this as closed

image

clemenko commented 2 months ago

huh. I see you closed the issue. Is it working now?

ertimas commented 2 months ago

TLDR; I couldn't get your script to work.

There were two issues.

  1. The fileserver service looks like it's fine, but after several minutes fails to be curlable. So, I ran hauler store serve -l debug fileserver <my hauler directory> and it worked. The command showed the fileserver taking >1minute for the to come up. See the screenshot I posted....
  2. The registry fails to load. This is due to the a longhorn-csi component in the hauler store .zst being corrupted during tar/untar. Removing the csi components from airgap_hauler.yaml fixed it.