HandyOSS / HandyHost

Host DVPN/HNS, Sia and Akash all in one UI.
https://handyhost.computer
GNU Lesser General Public License v2.1
102 stars 16 forks source link

stuck pods/containers #20

Closed Supernaut90 closed 2 years ago

Supernaut90 commented 3 years ago

with the recent internet outages i had a couple leases end rather abruptly. this is the second time ive seen where the lease ends, the bid is closed, but the POD is still open and using my internet and CPU.

https://gist.github.com/arno01/6384d4cb1a3b3a62011f854d2e52e283#file-clean_dangling_containers-sh ive tried running this script but i get errors: "clean_dangling_containers.sh: line 46: akash: command not found clean_dangling_containers.sh: line 46: akash: command not found clean_dangling_containers.sh: line 46: akash: command not found clean_dangling_containers.sh: line 46: akash: command not found clean_dangling_containers.sh: line 46: akash: command not found"

i used /root/.HandyHost/aktData/admin.conf for kubeconfig and entered PATH=$PATH:/root/.HandyHost/aktData/bin into terminal before sudo bash script but still got errors. please advise on where im messing this up, many thanks!!

Supernaut90 commented 3 years ago

UPDATE: after doing some reading i tried export PATH=$PATH:/root/.HandyHost/aktData/bin then bashed script. it then returned : The connection to the server 192.168.1.7:6443 was refused - did you specify the right host or port? Unable to connect to the server: dial tcp 192.168.1.7:6443: connect: no route to host

this is like good news right? im one step closer to this working? lol thank you again for your valuable time!

alexsmith540 commented 3 years ago

Seems like it did something. Did you also link up the location of your admin.conf in the shell script? I'm also wondering if when your internet/router restarted if it maybe assigned new IP addresses to your nodes? I have my router set to keep the IP addresses static. If you don't have any current leases, its possible that an easy solution would be to flash your thumbdrive -> flash your nodes and start from scratch? If you go the route of rebuilding the cluster from scratch: In the configuration UI, make sure you uncheck all the nodes and hit save before adding them back via the USBs. From there business as usual:: add the machines via thumbdrive, build kubernetes cluster.

Supernaut90 commented 2 years ago

Thanks for your time. This stuck container/pod seems to be something that is unavoidable. I've found that most akash providers will clean house with a script from time to time. The problem is I don't have the knowledge to troubleshoot this script on my own. Yes I added the admin.conf location to the shell script and entered the PATH location before running the script. I consistently get the unknown line 46 error. The interaction I typed out above I believe was due to my router giving two of my nodes the same IP address? My IP reservations had somehow cloned themselves to where I had two identical reservations for the same IP but different MAC address. I blame that on xfinity as I've never seen that happen before and it hasn't happened again.

To fix everything and I had to reinstall the flash drive on 4 nodes and rebuild the cluster.

I had problems getting the new handyhost + flash drives to work. After an hour or two or messing with everything on my end, I reinstalled the last revision of handyhost made the old flash drive and boom success first try.

As of right now I have 5 leases and one stuck pod . It seems to be a smaller deployment and isn't chewing up alot of resources, but if one of these PKT miners sticks I need to kick it out to make way for paying customers. Please advise on a script solution.

P.s. maybe your idea of putting a clean containers button in the UI is the best solution for us non coders

alexsmith540 commented 2 years ago

I'll go ahead and add the dangling container script to the Akash UI while I'm doing some new features in the next couple weeks, seems like a good utility to have bundled in.

alexsmith540 commented 2 years ago

FYI I added the script to the lastest release v0.5.1. I didnt quite get the UI of it integrated just yet but for now, you can now call it via CLI like this:

sudo su
cd /opt/handyhost/aktAPI
chmod +x ./removeDanglingContainers.sh
./removeDanglingContainers.sh