Azure / AKS-Edge

Welcome to the Azure Kubernetes Service (AKS) Edge repo.
MIT License
56 stars 37 forks source link

[BUG]AIO public preview with log file size issues #162

Closed JMayrbaeurl closed 10 months ago

JMayrbaeurl commented 10 months ago

Describe the bug Deploying Azure IoT Operations to AKS EE k8s cluster fills up the 1 GB log filesystem space completely after several days. Resizing the log file space in aksedge-config.json to 10 GB and recreating the cluster doesn't help, because the /var/log/ gets magically resized down to 1 GB after some time.

Invoke-AksEdgeNodeCommand -command "sudo df -h /var/log/" Filesystem Size Used Avail Use% Mounted on /dev/sda8 974M 419M 488M 47% /var/log

To Reproduce Steps to reproduce the behavior:

  1. Follow Single machine deployment according to https://learn.microsoft.com/en-us/azure/aks/hybrid/aks-edge-howto-single-node-deployment but use 10 for LogSizeInGB in aksedge-config.json
  2. Check /var/log with Invoke-AksEdgeNodeCommand -command "sudo df -h /var/log/". We show 10 GB of usable disk space
  3. Wait for one day and repeat step 2. New it only shows 974M for the size
  4. Wait till the journal reaches it's configured limit around 500M. No more workload deployment possible, at least with AIO, because the pods will report out of log file space on creation

Expected behavior AKS EE to use the full log file space up to 10 GB as configured in aksedge-config.json and not reduzing it to 1 GB. Resizing for a running cluster with Powershell would be nice, too.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

JMayrbaeurl commented 10 months ago

I've done a full reinstall now with the latest AKS EE version 1.5.203.0 and k8s 1.26.6. After 16 hours the log file space is still 10 GB as originally configured. I'll keep watching the log file space for the next days. If it doesn't change, this issue can be closed.

JMayrbaeurl commented 10 months ago

Just checked again. Log file space is still 10 GB. Looks like current version 1.5.203.0 of AKS EE doesn't have this issue.