abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
18.9k stars 382 forks source link

Network and kubernetes instability in extended use sessions #952

Open aran opened 9 months ago

aran commented 9 months ago

Description

If I leave my colima/kubernetes running for some time, it destabilizes unpredictably. The two signs of that happening are:

  1. I start seeing network connectivity errors from various services
  2. If I clean up all deployments and services, all pods get stuck in Terminating.

Version

colima version 0.6.7 git commit: ba1be00e9aec47f2c1ffdacfb7e428e465f0b58a

runtime: docker arch: x86_64 client: v24.0.7 server: v24.0.7 limactl version 0.19.1 qemu-img version 8.2.0 Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers

Operating System

Output of colima status

INFO[0000] colima is running using QEMU
INFO[0000] arch: x86_64
INFO[0000] runtime: docker
INFO[0000] mountType: sshfs
INFO[0001] socket: unix:///Users/aran/.colima/default/docker.sock INFO[0001] kubernetes: enabled

Reproduction Steps

I run a variety of services on Kubernetes and just let it sit. I use a tool called Skaffold to deploy to the local Kubernetes. It uses docker to load containers directly. When errors crop up everywhere, I kill Skaffold and it deletes the services and deployments. At that point I see all pods stay in Terminating status. If I restart colima, when it comes back up, those pods then finish Terminating normally.

Expected behaviour

Pods retain connectivity and always terminate normally within colima kubernetes instance.

Additional context

Kubernetes version v1.28.3+k3s2 M2, Sonoma 14.2.1, 24GB RAM cpu: 3 disk: 60 memory: 16

aran commented 9 months ago

I don't see the same instability with vz/rosetta. Understanding that vz/rosetta has other issues, it would be nice to have some troubleshooting ideas to investigate the instability.