burmilla / os

Tiny Linux distro that runs the entire OS as Docker containers
https://burmillaos.org
Apache License 2.0
210 stars 13 forks source link

2.0.0-beta update #138

Closed tredger closed 2 years ago

tredger commented 2 years ago

Cherry-picked relevant commits from build-1.9.5-rc2 to bring master in line with the v4.14.x branch changes. Also Updated:

This pre-emptively fixes #132 in the 2.0.0-beta release.

This PR also adds wireguard to the kernel.

olljanat commented 2 years ago

Note that os-base build failed so changes are still needed on there first https://github.com/burmilla/os-base/pull/1#issuecomment-1192144759

In additionally maybe it would make sense to use new beta version of Docker in here? https://github.com/moby/moby/releases/tag/v22.06.0-beta.0

And Docker Compose can jump to https://github.com/docker/compose/releases/tag/v2.7.0 (it is locked to latest 1.x version in 1.9.x versions of Burmilla). So those would be changes to do in os-services repo.

olljanat commented 2 years ago

In additionally, thing which will need some testing and mentioning in release notes. In master branch we are not locking Debian version used in console https://github.com/burmilla/os/blob/0dc7a00e0e8ccae1b8492214e0d97a7e229fd8e2/images/02-console/Dockerfile#L1 and Debian 11 have been released and marked as stable after beta.4 release so it will be used in here https://www.debian.org/releases/bullseye/

Some discussion about topic can be found from https://github.com/burmilla/os/pull/111

tredger commented 2 years ago

Note that os-base build failed so changes are still needed on there first burmilla/os-base#1 (comment)

I've just made a new PR with these changes.

In additionally maybe it would make sense to use new beta version of Docker in here? https://github.com/moby/moby/releases/tag/v22.06.0-beta.0

I have no problems with doing this, I note 22.06 is pretty fresh though. Should we consider waiting until we have a at least beta-1.

And Docker Compose can jump to https://github.com/docker/compose/releases/tag/v2.7.0 (it is locked to latest 1.x version in 1.9.x versions of Burmilla). So those would be changes to do in os-services repo.

I'll make this change as well. Anything else that should be updated in this repo?

tredger commented 2 years ago

docker-compose is now ready for 2.7.0 in https://github.com/burmilla/os-services/pull/18

Will wait for feedback on the docker version.

olljanat commented 2 years ago

@tredger please note also those inline comments.

This PR also adds wireguard to the kernel.

Btw. Do you know some good documentation about how to use setup and use wireguard in general? I would like to do some testing with and make sure that it really works now.

tredger commented 2 years ago

@tredger please note also those inline comments.

Thanks, noted. I'll go through and make these changes too. This might take a little while as I figure out how to test each of them but bear with me.

Btw. Do you know some good documentation about how to use setup and use wireguard in general? I would like to do some testing with and make sure that it really works now.

I'm testing with netmaker at the moment which uses wireguard under the hood. But otherwise linuxserver have a wireguard container that i've used before too and has several example setups in the readme depending on what you want to do.

tredger commented 2 years ago

How are you testing (ideally virtualising so I can do it) running under ARM64/aarch64 to make sure the kernel's all functioning?

olljanat commented 2 years ago

Only Raspberry Pi 4 is supported and those uses their own special kernel from os-rpi-kernel. I used to have physical device for testing it (did give it away) but I think that at least qemu should be able to emulate it.

Anyway, this PR starts to look good for me but will done some testing still before merging it.

olljanat commented 2 years ago

Oh, I just noticed that build-1.9.5-rc2 was my old test branch which contained this commit which have not been part of release versions https://github.com/burmilla/os/commit/5944ee1fc56371a804544087f6874256913efef7

However if I remember right it did not cause any new issues but it was not released as it also didn't fixed the issue https://github.com/burmilla/os/commit/2ab13953117747a373c4881955a0ac78e56a97bf is needed for that.

tredger commented 2 years ago

However if I remember right it did not cause any new issues but it was not released as it also didn't fixed the issue 2ab1395 is needed for that.

Ah, my bad. Do you want me to make another PR cherry-picking that commit?

olljanat commented 2 years ago

That can be handled later. Are you able to see https://github.com/burmilla/os/releases/tag/untagged-da728086ec59242c0574 would be good to do some testing with it. Especially can you get wireguard working or is other changes needed to support it?

tredger commented 2 years ago

No I can't see it. And yes keen to test.

I think I've got it working already for my use case (site to site over the Internet). But happy to look into other use cases as well.

olljanat commented 2 years ago

Based on https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-roles-for-an-organization Write access is minimum needed so switched you to that one. Try again?

tredger commented 2 years ago

Yep that's worked, thanks. I'll see about getting it up and running this week.

olljanat commented 2 years ago

@tredger how your tests looks like? I have not seen any urgent issues there so I think that it would be time to release it?

tredger commented 2 years ago

I had a phantom issue where, after upgrading to beta5, container-cron used a lot of CPU and repeated Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? in the log. But disabling and reenabling the service with ros service seems to have solved it and i've not been able to replicate it again even after reboots.

I haven't got any iscsi mounts to see whether the issues you were seeing in #111 discussion are resolved?

system-dockerd segfaults occasionally as well, but I'm not sure if that behaviour exists already or not system-dockerd[1072]: segfault at 8 ip 0000000000541d26 sp 000000c42004b308 error 4 in system-dockerd[400000+1486000]

But otherwise everything looks to be working fine.

olljanat commented 2 years ago

@tredger good findings. Which setup you are running those tests? And do you install it with cloud-init.

I noticed that Docker 22.06 looks to slower for starting on first boot so probably we need delay container-cron start.

tredger commented 2 years ago

@olljanat. I'm testing on two hosts. One is a bare-metal host that I upgraded from 1.9.3. The other is a new VM running inside KVM that I rebuild based on the ISO everytime changes are made. Both are running v2.0.0-beta5 and both get the segfault. I only observed the container-cron issue on the bare-metal host. Both are installed with cloud-init. The cloud-init is very standard. How are you running your tests/what are you benchmarking?

olljanat commented 2 years ago

Can you share content of cloud-init from both machines? (remember remove sensitive information first).

Also where you see those segfaults?

I only observed the container-cron issue on the bare-metal host

container-cron is not enabled by default, that why I assumed that you are using some kind of cloud-init in there.

How are you running your tests/what are you benchmarking?

So far I have just quickly tested to install it as Hyper-V VM to my laptop. On work I have some tooling which is able to setup multi node swarm with iSCSI, etc stuff included which I can use later but let's figure out why you see those issues first.

tredger commented 2 years ago

The segfault is in dmesg. On bare-metal:

[520141.478863] system-dockerd[1072]: segfault at 8 ip 0000000000541d26 sp 000000c42004b308 error 4 in system-dockerd[400000+1486000]
[520141.478872] Code: cc cc cc cc 64 48 8b 0c 25 c8 ff ff ff 48 3b 61 10 0f 86 9e 01 00 00 48 83 ec 38 48 89 6c 24 30 48 8d 6c 24 30 48 8b 44 24 40 <48> 8b 48 08 48 8b 50 10 48 8b 18 48 89 1c 24 48 89 4c 24 08 48 89

On VM:

[   15.329411] system-dockerd[734]: segfault at 8 ip 0000000000541d26 sp 000000c421431308 error 4 in system-dockerd[400000+1486000]
[   15.329417] Code: cc cc cc cc 64 48 8b 0c 25 c8 ff ff ff 48 3b 61 10 0f 86 9e 01 00 00 48 83 ec 38 48 89 6c 24 30 48 8d 6c 24 30 48 8b 44 24 40 <48> 8b 48 08 48 8b 50 10 48 8b 18 48 89 1c 24 48 89 4c 24 08 48 89

And the boring cloud-init:

rancher:
  environment:
    EXTRA_CMDLINE: /init
  network:
    interfaces:
      eth*:
        dhcp: false
      eth0:
        dhcp: true
  services_include:
    kernel-extras: true
    kernel-headers: true
    container-cron: true
  state:
    dev: LABEL=RANCHER_STATE
    wait: true
ssh_authorized_keys:
- <omitted>
olljanat commented 2 years ago

And the boring cloud-init:

It is not boring as it explains the behavior. Wildcard interface names like these definitely are not supported by RancherOS/BurmillaOS implementation of cloud-init.

rancher:
  network:
    interfaces:
      eth*:
        dhcp: false

Just remove it from /var/lib/rancher/conf/cloud-config.yml (or re-install without it) and segfault issue will disappear.

EDIT: I just noticed that there is some mentioning about that on documentation so there must be some logic in somewhere. However this example looks to be talking about kernel parameters, not about cloud-init so maybe that makes the difference.

tredger commented 2 years ago

@olljanat I've removed the wildcard interface network config, but I still get the segfaults. How would I go about debugging this further considering it's happening in system-dockerd? I see nothing anomalous in the /var/log/system-dockerd.log. Usually i would just install auditd but if I do that in the os-console that's not going to start in time for the segfault.

Do i need to make a custom service that runs auditd? What would you suggest.

olljanat commented 2 years ago

Please boot with debug logging enabled (separate option in boot menu) and open issue about it with all details. And do that by also limiting options, so try on vm, with default cloud-init, etc until you find what triggers it as I was not able to reproduce it after took that wildcard away from config.