burmilla / os

Tiny Linux distro that runs the entire OS as Docker containers
https://burmillaos.org
Apache License 2.0
210 stars 13 forks source link

CRITICAL: x509: certificate has expired or is not yet valid (all versions affected) #176

Open ArgonV opened 5 months ago

ArgonV commented 5 months ago

NOTE!!! Fix available in https://github.com/burmilla/os/releases/tag/v2.0.1


BurmillaOS Version: (ros os version) v2.0 release

Where are you running BurmillaOS? (docker-machine, AWS, GCE, baremetal, etc.) docker-machine on vSphere

Which processor architecture you are using? x86

Do you use some extra hardware? (GPU, etc)? No

Which console you use (default, ubuntu, centos, etc..)

Do you use some service(s) which are not enabled by default? No

Have you installed some extra tools to console? vmware-tools

Do you use some other customizations?

Please share copy of your cloud-init (remember remove all sensitive data first)

#cloud-config
runcmd:
- ["sudo", "mkfs.ext4", "/dev/sda"]
- ["sudo", "ros", "install", "-d", "/dev/sda", "--no-reboot", "-c", "/var/lib/rancher/conf/cloud-config.yml"]
- ["sudo", "reboot"]
rancher:
  docker:
    engine: docker-26.0.1
  sysctl:
    vm.max_map_count: 2621444
  state:
    autoformat:
    - /dev/sda
    - /dev/vda
    dev: LABEL=RANCHER_STATE
    wait: true

When I first boot up, and it pulls vmware tools ISO: I'm getting this message:

ros-sysinit:error: Failed Starting open-vm-tools Status : error pulling image configuration: Get https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/29/ x509: certificate has expired or is not yet valid, Code: 1

When I go to the site and look at the cert in my web browser, it was renewed Wed, 10 Apr 2024 23:38:51 GMT

olljanat commented 5 months ago

Maybe clock is not in sync in vSphere and firewall is not allowing default NTPs used by BurmillaOS?

v2.0 should read NTP servers from DHCP of those are offered from there #158 or you can use cloud-init method https://burmillaos.org/docs/configuration/advanced/write-files/

ArgonV commented 5 months ago

Howdy,

I added the write_files command and verified it wrote to /etc/ntp.conf with our on-prem time servers - I am still getting the issue. When I do a date command from a login, I am getting the correct time.

ArgonV commented 5 months ago

New cloud-init cloud-config:


#cloud-config
runcmd:
- ["sudo", "mkfs.ext4", "/dev/sda"]
- ["sudo", "ros", "install", "-d", "/dev/sda", "--no-reboot", "-c", "/var/lib/rancher/conf/cloud-config.yml"]
- ["sudo", "reboot"]
write_files:
  - container: ntp
    path: /etc/ntp.conf
    permissions: "0644"
    owner: root
    content: |
      server ntp1.tamu.edu iburst
      server ntp1.tamu.edu iburst
      server ntp1.tamu.edu iburst
      # Allow only time queries, at a limited rate, sending KoD when in excess.
      # Allow all local queries (IPv4, IPv6)
      restrict default nomodify nopeer noquery limited kod
      restrict 127.0.0.1
      restrict [::1]
rancher:
  sysctl:
    vm.max_map_count: 2621444
  state:
    autoformat:
    - /dev/sda
    - /dev/vda
    dev: LABEL=RANCHER_STATE
    wait: true
ArgonV commented 5 months ago

This issue actually started yesterday, right after the certificate belonging to https://production.cloudflare.docker.com/ was renewed. I tried the same cloud config on BurmillaOS 1.9.6 and while system-docker continued to get the same errors, the user-space docker instance had no such issues and was able to pull images from docker hub. We have no issues pulling docker images on anything that is not BurmillaOS or RancherOS, so I'm doubting it's an issue with docker hub or our overarching network/vSphere infrastructure.

For some reason it's only an issue with system-docker but despite combing through Google, Github, Docker Forums, etc I haven't been able to find any solution that isn't along the lines of "stop using Docker 17 and update it".

Are you able to reproduce the issue, by chance, or is it truly just me?

olljanat commented 5 months ago

Oh, I see so some point Docker Hub has started using Cloudflare services with Let's Encrypt Certificates and now some setting is different on latest one so system-docker does not support those anymore.

Definitely all RancherOS and BurmillaOS installations are affected. Only possible workaround most probably is using registry mirror https://burmillaos.org/docs/configuration/docker/#using-a-pull-through-registry-mirror

Need to investigate...

ArgonV commented 5 months ago

Thank you @olljanat ! We're trying to explore work-around options. Currently we do not have a registry mirror set up.

olljanat commented 5 months ago

Ok, so Rancher actually have stored Root CA certificates list to Git and it is very old list https://github.com/burmilla/os-initrd-base/commits/master/assets/ca-certificates.crt

Will build new hotfix release which comes with one file so it will solve this issue for new installations but need to also figure out how to fix all existing ones because upgrade does not work anymore for same reason.

olljanat commented 5 months ago

Ok. CA certificates bundle is actually mounted to console so you can update it simply by running this command:

sudo wget -O /etc/ssl/certs/ca-certificates.crt.rancher https://raw.githubusercontent.com/burmilla/os-initrd-base/master/assets/ca-certificates.crt

and then just reboot is needed and system-docker can pull images again.

ArgonV commented 5 months ago

Thanks much, can I run that command on startup?

olljanat commented 5 months ago

Just use new ISO from v2.0.1 and you are good to go.

However, let's keep this issue open for a while so other struggling with this issue will see it too.

TrentTAMU commented 5 months ago

Thank you so much for this, this issue is nowhere to be found on the internet except for here!

Working for me now on v2.0.1