abiosoft / colima

Container runtimes on macOS (and Linux) with minimal setup
MIT License
19.79k stars 397 forks source link

Colima unable to restart on m1 after operating system restart #381

Open jamie-smyth-at-thistle opened 2 years ago

jamie-smyth-at-thistle commented 2 years ago

Description

We are seeing an issue where colima will not restart after OS X shuts down if colima is running during shutdown.

➜  ~ colima start --verbose
INFO[0000] using docker runtime
INFO[0000] starting colima
INFO[0000] starting ...                                  context=vm
> msg="Using the existing instance \"colima\""
> msg="errors inspecting instance: [failed to get Info from \"/Users/dipdhanesha/.lima/colima/ha.sock\": Get \"http://lima-hostagent/v1/info\": dial unix /Users/dipdhanesha/.lima/colima/ha.sock: connect: connection refused]"
FATA[0000] error starting vm: error at 'starting': exit status 1

Version

Colima Version:

➜  ~ colima version
colima version 0.3.4
git commit: 5a4a70481ca8d1e794677f22524e3c1b79a9b4ae

Lima Version:

➜  ~ limactl --version
limactl version 0.10.0

Qemu Version:

➜  ~ qemu-img --version
qemu-img version 6.2.0
Copyright (c) 2003-2021 Fabrice Bellard and the QEMU Project developers

Operating System

Reproduction Steps

  1. colima start
  2. Shutdown Mac OS (without running colima stop)
  3. Start Mac OS
  4. colima start

Expected behaviour

Colima starts successfully

Additional context

No response

abiosoft commented 2 years ago

What is the output of colima list?

Does deleting the socket file has any effect? i.e. delete and try again.

rm /Users/dipdhanesha/.lima/colima/ha.sock
colima start
jamie-smyth-at-thistle commented 2 years ago

Thanks for the quick response. I will need to get it back into the broken state and try deleting the sock file. Had to delete and recreate colima to get some work done. Hopefully I can get back to you shortly.

dip-thistle commented 2 years ago

Hi @abiosoft, This is the output of colima list

➜  ~ colima list
PROFILE    STATUS    ARCH       CPUS    MEMORY    DISK
default    Broken    aarch64    6       8GiB      60GiB

I removed the socket file and tried running the command again but it did not work. This was the output.

➜  ~ rm /Users/dipdhanesha/.lima/colima/ha.sock
➜  ~ colima start
INFO[0000] using docker runtime
INFO[0000] starting colima
INFO[0000] starting ...                                  context=vm
> msg="Using the existing instance \"colima\""
> msg="errors inspecting instance: [failed to connect to \"/Users/dipdhanesha/.lima/colima/ha.sock\": stat /Users/dipdhanesha/.lima/colima/ha.sock: no such file or directory]"
FATA[0000] error starting vm: error at 'starting': exit status 1
abiosoft commented 2 years ago

Thanks for the update, will have a look.

Jsince99 commented 2 years ago

Hey, @abiosoft , and @dip-thistle I had the same issue. I tried stopping broken instance and started it again it worked.

limactl stop -f colima

INFO[0000] The QEMU process seems already stopped       
INFO[0000] Sending SIGKILL to the host agent process 1689 
ERRO[0000] operation not permitted                      
INFO[0000] Removing *.pid *.sock under "/Users/sravanth/.lima/colima" 
INFO[0000] Removing "/Users/sravanth/.lima/colima/ga.sock" 
INFO[0000] Removing "/Users/sravanth/.lima/colima/ha.pid" 
INFO[0000] Removing "/Users/sravanth/.lima/colima/ha.sock"

colima start

INFO[0000] starting colima                              
INFO[0000] runtime: docker                              
INFO[0000] preparing network ...                         context=vm
INFO[0001] starting ...                                  context=vm
INFO[0023] provisioning ...                              context=docker
INFO[0023] starting ...                                  context=docker
INFO[0029] done  
abiosoft commented 2 years ago

limactl stop -f colima

This is equivalent to colima stop -f

Jerome1337 commented 2 years ago

I had the same issue on 2019 Intel Macbook Pro after a OS restart.

The error was:

> errors inspecting instance: [failed to get Info from "/Users/JEROME/.lima/colima/ha.sock": Get "http://lima-hostagent/v1/info": dial unix /Users/JEROME/.lima/colima/ha.sock: connect: connection refused]

After executing colima stop -f everything was ok

Agnibho-8 commented 1 year ago

Colima stop didn't work, instead system restart worked for me

abiosoft commented 1 year ago

Colima stop didn't work, instead system restart worked for me

You can use the --force flag to enforce it when stop fails.

colima stop --force
dehengxu commented 1 year ago

Hi @abiosoft, This is the output of colima list

➜  ~ colima list
PROFILE    STATUS    ARCH       CPUS    MEMORY    DISK
default    Broken    aarch64    6       8GiB      60GiB

I removed the socket file and tried running the command again but it did not work. This was the output.

➜  ~ rm /Users/dipdhanesha/.lima/colima/ha.sock
➜  ~ colima start
INFO[0000] using docker runtime
INFO[0000] starting colima
INFO[0000] starting ...                                  context=vm
> msg="Using the existing instance \"colima\""
> msg="errors inspecting instance: [failed to connect to \"/Users/dipdhanesha/.lima/colima/ha.sock\": stat /Users/dipdhanesha/.lima/colima/ha.sock: no such file or directory]"
FATA[0000] error starting vm: error at 'starting': exit status 1

Try to delete ha.pid

davidkyles commented 1 year ago

Thank you. deleting ha.sock, ha.pid and vz.pid got me back up and running

davepoon commented 1 year ago

As a developer using a m1 Mac, I experience this issue on a daily basis while working with nodejs and php containers. I hope a solution can be found without the need for frequent restarts.

protometa commented 1 year ago

a colima restart -f also fixed it for me

joachim-n commented 1 year ago

I can confirm that limactl stop -f colima fixed it for me.

But I have been shutting down my mac every day with colima still running, and today is the first time I've had this problem.

skgandikota commented 1 year ago

limactl stop -f colimaworked for me.

drupalninja commented 11 months ago

limactl stop -f colima

this worked for me thanks!

farooqkhanDH commented 11 months ago

Force stopping using colima stop -f and starting again worked for me.

laacpleesis commented 10 months ago

Restarting my mac and colima did not solve the issue. It was still stuck on

msg="errors inspecting instance: [failed to get Info from \"/Users/USER/.lima/colima/ha.sock\": Get \"http://lima-hostagent/v1/info\": dial unix /Users/USER/.lima/colima/ha.sock: connect: connection refused]"

and inside logs (cat /Users/USER/.colima/_lima/colima/ha.stderr.log):

{"level":"fatal","msg":"template: :1:21: executing \"\" at \u003cfd_connect \"/Users/USER/.colima/_lima/_networks/user-v2/user-v2_qemu.sock\"\u003e: error calling fd_connect: fd_connect: dial unix /Users/USER/.colima/_lima/_networks/user-v2/user-v2_qemu.sock: connect: connection refused","time":"2024-01-09T10:45:32+02:00"}

And reinstalling lima and qemu didnt help. I then tried

chmod +x /Users/USER/.colima/_lima/_networks/user-v2/user-v2_qemu.sock

still same error.

then I tried

rm /Users/USER/.colima/_lima/_networks/user-v2/user-v2_qemu.sock

and inside logs (cat /Users/USER/.colima/_lima/colima/ha.stderr.log):

{"level":"fatal","msg":"template: :1:21: executing \"\" at \u003cfd_connect \"/Users/USER/.colima/_lima/_networks/user-v2/user-v2_qemu.sock\"\u003e: error calling fd_connect: fd_connect: dial unix /Users/USER/.colima/_lima/_networks/user-v2/user-v2_qemu.sock: connect: no such file or directory","time":"2024-01-09T10:52:38+02:00"}

reinstalling qemu and lima didnt bring the file back, so I gave up and brew removed colima, and rm -rf .colima folder.

I hope somebody can post a fix, my containers are huge and it wastes a lot of time to pull them again.

Versions qemu-img --version qemu-img version 8.2.0 Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers

limactl --version limactl version 0.19.1

colima version colima version 0.6.7 git commit: ba1be00e9aec47f2c1ffdacfb7e428e465f0b58a runtime: docker arch: x86_64 client: v24.0.7 server: v24.0.7

denistorjai commented 10 months ago

I just managed to recreate this exact same issue by accident.

Starting Colima and then restarting your PC causes it to not be able to start. After running the stopping Colima command and starting again it runs as intended.

daveyarwood commented 10 months ago

I tried many of the things above and was still getting the same "connection refused" error from the OP.

What finally ended up working for me was:

brew uninstall colima qemu lima
rm -rf ~/.colima
brew install colima
brew services restart colima
colima restart
laacpleesis commented 10 months ago

Not a fan of the the hard reinstall approach. If there was a quickfix, so I dont need to recreate all the containers, it would be fine by me. I also notice, that it happens after shutting down my mac.

So replicating scenario could be something like colima start start some container shutdown mac with colima and container running. colima start -> error

I have 4 containers running, started with docker compose. tomcat, database, java etc.

riconeitzel commented 10 months ago

SOLUTION FOR ME WAS:

https://github.com/abiosoft/colima/issues/938#issuecomment-1895461764

Remove the _networks folder and start colima!

HatemTemimi commented 9 months ago

@laacpleesis I tried a lot of solution including, purging config, new install with brew, and the only one that worked for me was: brew remove qemu lima colima && rm -rf ~/.lima && rm -rf ~/.colima which is the hard delete, I am on MacOS Sonoma 14.3.1

sergeybe2 commented 9 months ago

This command helped me:

rm ~/.colima/_lima/_networks/user-v2/usernet_user-v2.pid

Run it after colima stop -f

MikeJansenNextira commented 9 months ago

SOLUTION FOR ME WAS:

#938 (comment)

Remove the _networks folder and start colima!

That worked for me.

bradj commented 8 months ago

SOLUTION FOR ME WAS:

#938 (comment)

Remove the _networks folder and start colima!

Thank you.

spaceneedle2019 commented 5 months ago

This command helped me:

rm ~/.colima/_lima/_networks/user-v2/usernet_user-v2.pid

Run it after colima stop -f

This solution helped me out solving the problem. Thank you. :)