Open khaled opened 7 months ago
Can you tweak K3s nixos test to reproduce the problem?
https://github.com/NixOS/nixpkgs/blob/master/nixos/tests/k3s/single-node.nix
At nixpkgs repository, to execute test:
nix build .#k3s.passthru.tests.single-node
@superherointj I'm not able to run the current test as is on my mac because of #304580. I suspect that removal of some or all of the extraFlags
would trigger it.
My network configuration is this:
At extraFlags
, for server, I use: --disable-network-policy
(I haven't had the time to look into this.)
networking.firewall = {
#enable = lib.mkForce false; # Not used, for test only.
#package = pkgs.iptables-legacy; # Not used, for test only.
allowedUDPPorts = [
53 # dns
8472 # cni (flannel vxlan)
];
allowedTCPPorts = [
10250 # metrics server
10443 # Workaround for: Liveness probe failed: Get "https://10.42.6.5:10250/livez": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
] ++ (lib.optionals cfg.master [
6443 # k8s api
2379 # etcd client requests
2380 # etcd peer communication
]);
trustedInterfaces = [ "cni+" ];
};
# https://github.com/NixOS/nixpkgs/issues/98766
boot.kernelModules = [
"br_netfilter"
"ip_conntrack"
"ip_vs"
"ip_vs_rr"
"ip_vs_wrr"
"ip_vs_sh"
"overlay"
];
boot.kernel.sysctl = {
"net.bridge-nf-call-ip6tables" = 1;
"net.bridge-nf-call-iptables" = 1;
"net.ipv4.ip_forward" = 1;
};
Also test with firewall disabled to check for firewall issue.
Works for me on Raspberry Pi 4 (aarch64-linux
).
I think, we should codify the routine for opening firewall and configuring kernel in the NixOS module. But this needs to be debated.
Thanks @superherointj. I tried pasting in your config, including the extraFlags
, and I still get the same errors mentioned in the bug description in my aarch64-linux VM...
Thanks @superherointj. I tried pasting in your config, including the
extraFlags
, and I still get the same errors mentioned in the bug description in my aarch64-linux VM...
As error is:
nf_tables: Couldn't load match `mark'
Did you test disabling firewall?
networking.firewall.enable = lib.mkForce false; # Not used, for test only.
Did you test using iptables
?
services.k3s.package = pkgs.iptables-legacy;
Also, rebase nixpkgs to master. (nix flake update) Because it is old (2024-02-10) by now. (Current is: 2024-05-12)
Why K3s tests for aarch64-linux
aren't failing in nixpkgs?
I have just executed your test (k3s-aarch64 repository) on a Raspberry Pi 4 and I cannot reproduce issue.
I don't see a special configuration in my Raspberry Pi 4 host system to justify the difference.
Maybe there is a difference in how your host system (that is Darwin based) that is affecting things? Can you test it in some non-Darwin hosted Linux, instead?
Maybe a VM from cloud?
Thanks @superherointj. I tried pasting in your config, including the
extraFlags
, and I still get the same errors mentioned in the bug description in my aarch64-linux VM...As error is:
nf_tables: Couldn't load match `mark'
Did you test disabling firewall?
yes, no luck :(
networking.firewall.enable = lib.mkForce false; # Not used, for test only.
Did you test using
iptables
?services.k3s.package = pkgs.iptables-legacy;
yes, no luck :(
Also, rebase nixpkgs to master. (nix flake update) Because it is old (2024-02-10) by now. (Current is: 2024-05-12)
tried this, no luck :(
@superherointj
Why K3s tests for
aarch64-linux
aren't failing in nixpkgs?
I adapted the single-node test to my flake (using nixosTest
) and ran it , and it actually passes. But, I see the iptables-restore error message scroll by in the logs as the test runs. I don't see the same error in the latest hydra run, however.
I have just executed your test (k3s-aarch64 repository) on a Raspberry Pi 4 and I cannot reproduce issue.
I don't see a special configuration in my Raspberry Pi 4 host system to justify the difference.
Maybe there is a difference in how your host system (that is Darwin based) that is affecting things? Can you test it in some non-Darwin hosted Linux, instead?
Maybe a VM from cloud?
There could be a difference in the host. Or it could have something to do with the way qemu runs on apple silicon? I've tried this on multiple apple silicon macs and it fails. It works just fine on x86_64-darwin machines. I specifically need to run these VMs on Darwin, so, unfortunately even if it works on a non-Darwin machine, its not too useful to me :(
BTW, thanks for putting effort into this!
I specifically need to run these VMs on Darwin, so, unfortunately even if it works on a non-Darwin machine, its not too useful to me :(
I find it relevant to narrow down the problem. Once we understand what is going on, it is easier to come up with a proper fix.
BTW, thanks for putting effort into this!
Sorry about delay.
unfortunately even if it works on a non-Darwin machine, its not too useful to me :(
As there is a problem for you, it's necessary to do proper triage, and then seek a solution in the proper channels.
On the differences in the host. My current (simple) suggestion is to rule out your test in a aarch64-linux
generic hardware (maybe from a cloud provider). I had it executed at Raspberry Pi, but who knows if there is something else interfering, I don't know.
If problem doesn't appear in generic aarch64-linux
, as you said, then, would make sense digging into the possibilities of QEMU, Apple hardware, or something else.
Still, I think you should not get discouraged (because you want to run it on Darwin host). If there is a lower abstraction issue, it will creep in on other things as well and not just K3s. If fixed, then, you end up having what you want and also not having some other problems that we don't really understand at this moment.
Describe the bug
When trying to bring up k3s on aarch64-linux, pods get stuck in the following state:
The k3s logs reflect repeated errors related to nf_tables:
No issues running the same thing on x86_64-linux.
I suspect there's some additional kernel config required on aarch64-linux, but it isn't obvious to me at the moment what this might be.
Steps To Reproduce
Run the default package in the following flake on an aarch64-darwin or aarch64-linux machine: https://github.com/khaled/k3s-aarch64.
This will bring up an aarch64-linux virtual machine with k3s.
For aarch64-darwin, note that you'll need to have a linux builder set up; see: https://nixos.org/manual/nixpkgs/unstable/#sec-darwin-builder.
Also FWIW, note that I've only actually run the aarch64-darwin and x86_64-linux packages :)
Expected behavior
K3s + all default services to come up correctly.
Notify maintainers
@euank @mic92 @yajo
Metadata
Please run
nix-shell -p nix-info --run "nix-info -m"
and paste the result.Add a :+1: reaction to issues you find important.