anil-sezer / Portfolio

It's my portfolio that I made with love!
https://www.anil-sezer.com/
MIT License
1 stars 0 forks source link

Gather Graceful Shutdown Commands! #81

Closed anil-sezer closed 10 months ago

anil-sezer commented 11 months ago

The logs indicate that Prometheus is facing some challenges, such as "Loading on-disk chunks failed," but it seems to be able to recover and continue working. However, it's hard to say if a simple reboot will resolve these issues or potentially worsen them.

Before running sudo reboot, you might consider doing the following:

  1. Backup: Make sure to backup important data, especially if it's a production system. This includes configurations, databases, etc.

  2. Drain Node: If this is part of a Kubernetes cluster and you're running workloads on this node, you might want to drain it first so that Kubernetes can gracefully remove the pods.

    kubectl drain <NODE_NAME> --ignore-daemonsets
  3. Review Active Sessions: Use the who command to find out what users are currently logged into the system. This can be useful if you're on a multi-user system.

  4. Check Running Applications: Make sure there are no critical applications that will be interrupted. You can use top or htop to check this.

  5. Gracefully Stop Services: If you have important services running, stop them gracefully if needed. For instance, you could scale down deployments in Kubernetes to minimize the impact of the reboot.

    kubectl scale deployment <DEPLOYMENT_NAME> --replicas=0 -n <NAMESPACE>
  6. Check Disk Space: Make sure there is enough disk space, as lack of it might be one of the reasons you're facing issues. Use df -h to check disk usage.

  7. Check System Messages: Quickly run dmesg or check /var/log/syslog for any critical errors that need immediate addressing.

  8. Run Pre-reboot Scripts: If you have any, now is the time to run them.

  9. Check Connectivity: Ensure you have alternative ways to connect to the machine in case the reboot fails, especially if you're working on a remote server.

After you've considered these points, you can go ahead and run sudo reboot. After the system is back up, make sure to check the logs again to see if the previous issues are resolved.

anil-sezer commented 10 months ago

Did this