docker / compose

Define and run multi-container applications with Docker
https://docs.docker.com/compose/
Apache License 2.0
33.84k stars 5.21k forks source link

[BUG] watch crashes when deleting file #11066

Closed perosb closed 8 months ago

perosb commented 1 year ago

Description

When deleting a file (unsure if the file existed or not but should have since it synced it) watch command crashes and cannot be restarted.

develop:
  watch:
    - action: sync
      path: ${LOCAL_DEPLOY_PATH}\platform
      target: c:/inetpub/wwwroot/
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\Web.config.xdt
☺�container 061ae4ad9ec878e7a259e45aa1d7b4bd0dc56468b05497fa18cd71dd5f1c0cbe encountered an error during hcs::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

Then when trying to restart it is locked:

> docker compose watch --no-up
cannot take exclusive lock for project "kermit": process with PID 20836 is still running

Killing the 20836 process still errors out the same.

Steps To Reproduce

It seem to be reproducable

watching [C:\t\docker\deploy\platform]
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\images.jpg
☻Rtar: Removing leading drive letter from member names
x inetpub/wwwroot/images.jpg☻☻
Syncing cm after changes were detected:
  - C:\t\docker\deploy\platform\images.jpg
☺�container 7b26f6516e4c0ed000d0d71b1f01250411af2ddab0b17cd3f7f3b391a4ee97a0 encountered an error during hcs::System::CreateProcess: failure in a Windows system call: The system cannot find the file specified. (0x2)

Compose Version

Docker Compose version v2.22.0

Docker Environment

Client:
 Version:    24.0.6
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.3
    Path:     C:\Users\Administrator\.docker\cli-plugins\docker-buildx.exe
  compose: Docker Compose (Docker Inc.)
    Version:  v2.22.0
    Path:     C:\ProgramData\Docker\cli-plugins\docker-compose.exe
  scout: Command line tool for Docker Scout (Docker Inc.)
    Version:  0.17.1
    Path:     C:\Users\Administrator\.docker\cli-plugins\docker-scout.exe

Server:
 Containers: 14
  Running: 0
  Paused: 0
  Stopped: 14
 Images: 861
 Server Version: 24.0.4
 Storage Driver: windowsfilter
  Windows:
 Logging Driver: json-file
 Plugins:
  Volume: local
  Network: ics internal l2bridge l2tunnel nat null overlay private transparent
  Log: awslogs etwlogs fluentd gcplogs gelf json-file local logentries splunk syslog
 Swarm: inactive
 Default Isolation: process
 Kernel Version: 10.0 20348 (20348.1.amd64fre.fe_release.210507-1500)
 Operating System: Microsoft Windows Server Version 21H2 (OS Build 20348.1970)
 OSType: windows
 Architecture: x86_64
 CPUs: 8
 Total Memory: 31.86GiB
 Name: kermit-dev
 ID: YP3Q:GBSN:NFJ3:QQMM:DB3Z:CD3V:7RBI:V445:473L:3WU3:VOP3:5DVB
 Docker Root Dir: C:\ProgramData\docker
 Debug Mode: false
 Username: kermit
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

Anything else?

The file is not removed from container. No idea of how/where the lock is kept. A restart of docker-engine removed the lock.

pbering commented 1 year ago

Same thing happens when copying many files into a synced folder like this:

    develop:
      watch:
        - action: sync
          path: ./web
          target: /inetpub/wwwroot

only way to recover from cannot take exclusive lock for project is to restart the host, not even restarting Docker Desktop or dockerd helps.

mrbiggred commented 12 months ago

@pbering and @perosb a possible workaround so you don't have to restart your host can be found in issue #11069:

https://github.com/docker/compose/issues/11069#issuecomment-1769694535

ndeloof commented 11 months ago

lock is managed by https://github.com/moby/moby/blob/master/pkg/pidfile/pidfile.go#L29 According to "process with PID 20836 is still running" message, the compose process is still reported by system as "alive". If you can reproduce this issue, could you please inspect the referred process ?

pbering commented 11 months ago

When the issue happens and I see that message, then there is no process with that PID.

ndeloof commented 11 months ago

which OS are you running on ?

perosb commented 11 months ago

which OS are you running on ?

Kernel Version: 10.0 20348 (20348.1.amd64fre.fe_release.210507-1500)
Operating System: Microsoft Windows Server Version 21H2 (OS Build 20348.1970)
OSType: windows
Architecture: x86_64
AlexeyPlodenko commented 10 months ago

Same on

OS Name:                   Microsoft Windows 10 Pro
OS Version:                10.0.19045 N/A Build 19045
mac-hel commented 10 months ago

@pbering and @perosb a possible workaround so you don't have to restart your host can be found in issue #11069:

#11069 (comment)

Another workarround (Linux) is to stop containers before exiting watch. Ctrl+z to suspend watch docker-compose down to stop and remove containers fg to bring watch process into foreground Ctrl+c to exit watch

wclr commented 9 months ago

According to "process with PID 20836 is still running" message, the compose process is still reported by system as "alive". If you can reproduce this issue, could you please inspect the referred process ?

I've been implementing small script that watches docker-compose.yaml and restarts compose watch on its changes (to handle actual state), and run into this issue.

The question is why it reports that process with PID XXXX is still running when no process with this pid is running? I could net get its logic. What is it checking for when start compose watch, does it check the process?

ndeloof commented 9 months ago

@wclr process detection is implemented by https://github.com/moby/moby/blob/master/pkg/process/process_windows.go

wclr commented 9 months ago

@wclr process detection is implemented by https://github.com/moby/moby/blob/master/pkg/process/process_windows.go

Well, here I believe it checks the process. But the fact is that when launching docker compose watch it reports that process with PID XXXX is still running while there is no XXXX in the tasklist (for example, in my case this PID was killed by the aforementioned script that spawned docker compose watch). Рeople above mention this too,, so you probably need to check the logic behind this report and check to ensure that it can not be the case.

At4m4n commented 9 months ago

When the issue happens and I see that message, then there is no process with that PID.

Same. No such process at Windows host itself, neither at the container I watch (Ubuntu based app image). Seems to exist somewhere within the "docker engine space"

UPD. Managed to resolve this by updating to latest docker version. The installation itself said there's an assistance service process running and suggested to kill it. watch command works again after update, no host reboot needed.

wclr commented 9 months ago

UPD. Managed to resolve this by updating to latest docker version. The installation itself said there's an assistance service process running and suggested to kill it. watch command works again after update, no host reboot needed.

It is not fixed, I used the latest version when run into this. And you don't need to reboot, you can just delete %LOCALAPPDATA%/docker-compose.[YOUR_COMPOSE_PROJECT_NAME].pid.

I eventually ended up writing my own custom watch script to fully replace docker compose watch functionality (for my case). It watches need file changes in the project and executes (inside the container) copying from the attached host volume to docker volumes to keep them in sync, this script as well initially runs rsync to make the initial sync with the host.

The problem mentioned in this issue is not the only one for current compose watch implementation, for example it also ignores /skips some file changes if they are made in a batch, so the custom solution can solve all this.

AlexeyPlodenko commented 9 months ago

I am deleting the image and the container and then run the docker-compose watch in one PowerShell script, to sync the files, before the containers are started:

Powershell.exe -noexit -command "cd ../..;  docker rm --force backoffice-php; docker rmi $(docker images --format '{{.Repository}}:{{.Tag}}'|findstr 'backoffice-php'); docker-compose watch"
torrinworx commented 8 months ago

Can confirm that I'm seeing this issue on docker version 24.0.7, and a temporary work around for it is to delete the .pid file in: C:\Users\\AppData\Local\docker-compose\.pid

princemaple commented 8 months ago

Encountered this on Desktop 4.27.1 (136059), Engine: 25.0.2, Compose: v2.24.3-desktop.1

glours commented 8 months ago

Hey @perosb I'm not able to reproduce with the latest Docker Desktop release 4.27.2, can you give it a try? If you still have the issue, can you give us a minimal & complete reproduction case?

For all the other, if you don't have the same issue as the original one, please:

Thanks all 🙏

Piglow19 commented 8 months ago

Errors are still occurring (I've updated to the latest Docker version 4.27.2) :

$ docker compose watch cannot take exclusive lock for project "": process with PID 43396 is still running

My OS :

OS Name: Microsoft Windows 11 Famille OS Version: 10.0.22631 N/A build 22631

If necessary, I can open a repository.

borgez commented 8 months ago

for me manual delete file in AppData\Local\docker-compose\ *.pid solove problem but very annoying

docker version
Client:
 Cloud integration: v1.0.35+desktop.10
 Version:           25.0.3
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        4debf41
 Built:             Tue Feb  6 21:13:02 2024
 OS/Arch:           windows/amd64
 Context:           default

Server: Docker Desktop 4.27.2 (137060)
 Engine:
  Version:          25.0.3
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       f417435e5f6216828dec57958c490c4f8bae4f98
  Built:            Wed Feb  7 00:39:16 2024
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
torrinworx commented 8 months ago

Hey @perosb I'm not able to reproduce with the latest Docker Desktop release 4.27.2, can you give it a try? If you still have the issue, can you give us a minimal & complete reproduction case?

For all the other, if you don't have the same issue as the original one, please:

  • Check the latest version of Docker Desktop
  • If you can reproduce your issue, please open a new one with a full repo case

Thanks all 🙏

@glours I'm actually facing this issue with my repo here: https://github.com/torrinworx/Bitorch

To reproduce:

  1. $ docker compose -f .\dev.docker-compose.yml build --no-cache
  2. $docker compose -f .\dev.docker-compose.yml watch
  3. $ ctrl+c, and delete/remove running container/compose stacks
  4. $ docker compose -f .\dev.docker-compose.yml watch

Result:

PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> docker compose -f .\dev.docker-compose.yml watch

cannot take exclusive lock for project "bitorch-development": process with PID 47156 is still running
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> tasklist /fi "PID eq 47156" 
INFO: No tasks are running which match the specified criteria.
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch>

This is with docker desktop 4.27.2.

FibreFoX commented 8 months ago

Maybe this is related, but for me it looks like the %LOCALAPPDATA%\docker-compose\PROJECTNAME.pid-file is not getting removed properly. When exiting via CRTL + C, the exit code is 130 ($LastExitCode when using PowerShell), maybe thats the reason that watch-command is not working as intended?

I always need to delete that file, no other problems so far (but haven't played around with this new feature yet).

Running Docker Desktop 4.27.2 on Windows 10 Pro 22H2 using HyperV.

ndeloof commented 8 months ago

This file is not expected to be removed after command completion, but when executed compose command check the registered pid is alive (see https://github.com/moby/moby/blob/master/pkg/process/process_windows.go).

glours commented 8 months ago

@torrinworx 👋 I used your repository but wasn't able to reproduce on my side. I don't know what happens to be honest, can you share me a recording so I'll be able to check if I'm not missing a step? Are you using WSL2 or HyperV as Docker Desktop Virtutal machine?

torrinworx commented 8 months ago

@torrinworx 👋 I used your repository but wasn't able to reproduce on my side. I don't know what happens to be honest, can you share me a recording so I'll be able to check if I'm not missing a step? Are you using WSL2 or HyperV as Docker Desktop Virtutal machine?

@glours I'm using WSL2 on Windows 11 Home 22H2 22621.3155. Here is a video with this issue happening with the Bitorch repository I linked above:

https://www.youtube.com/watch?v=PzrfWC825Rc

glours commented 8 months ago

@torrinworx thank you very much! Can I ask you an another question, I want to check if you don't have a old version of Compose in your path that could take the priority in favor of the embedded version of Desktop Can you share the result of docker compose version please?

torrinworx commented 8 months ago

np!

Huh yeah it looks like it's still taking the old desktop version: Docker Compose version v2.24.5-desktop.1

Even though my docker desktop client is saying v4.27.2

glours commented 8 months ago

Unfortunately no 😞 , Compose v2.24.5-desktop.1 is the version shipped in Docker Desktop 4.27.2

glours commented 8 months ago

@torrinworx can you try something else please, instead of doing docker compose watch directly can you try the following steps:

And a 2nd test:

ndeloof commented 8 months ago

Can you please check the status of the process listed in the lock file ?

Get-Process -Id 146328
torrinworx commented 8 months ago

@torrinworx can you try something else please, instead of doing docker compose watch directly can you try the following steps:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • docker compose -f .\dev.docker-compose.yml watch again

And a 2nd test:

  • docker compose -f .\dev.docker-compose.yml up -d
  • docker compose -f .\dev.docker-compose.yml watch
  • Then do the CTRL+C
  • Don't remove the containers in Docker Desktop
  • docker compose -f .\dev.docker-compose.yml watch again

So both tests result in the same error:

PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch> docker compose -f .\dev.docker-compose.yml watch
cannot take exclusive lock for project "bitorch-development": process with PID 165984 is still running
PS C:\Users\torri\Desktop\Repositories\Personal\Bitorch>

However when I delete the .pid file from the directory they both work just fine.

@ndeloof After I CTRL+C the watch command and delete the containers this is the result:

C:\Users\torri>tasklist /FI "PID eq 162632"
INFO: No tasks are running which match the specified criteria.

C:\Users\torri>

It only shows No tasks are running... after you CTRL+C the watch command, when it's still running it will show this, even when the containers are deleted:

C:\Users\torri>tasklist /FI "PID eq 162632"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
docker-compose.exe          162632 Console                    1     47,308 K

C:\Users\torri>
ndeloof commented 8 months ago

Looking at the pidfile code which should detect process is done, I wonder GetExitCodeProcess could return without an error, and then code would return true

glours commented 8 months ago

@ndeloof agree

glours commented 8 months ago

@torrinworx can you try this specific version of Compose (choose the right binary for you) and install it with the name docker-compose(.exe) under your ~/.docker/cli-plugins directory

torrinworx commented 8 months ago

@glours That worked! I was able to re-watch the compose file and build the containers without seeing the PID error.

milas commented 7 months ago

This fix has been included in Compose 2.24.7+. If you continue to have problems after upgrading, comment here and we can re-open the issue or create a new one as appropriate. Thanks for the report!

julielerman commented 3 months ago

I will be documenting all of these experiments in a blog post but just wanted to leave it here in case someone gets stuck.

Some data for you @milas , as I'm getting this on mac Monterey v 12.5 Docker Desktop Docker & verified Compose version matches: v2.27.1-desktop.1 Visual Studio Code 1.90.2 Docker extension v1.29.1

docker compose -f "docker-compose.yml" up -d --build --watch docker compose down (containers are gone) docker compose watch

cannot take exclusive lock for project "xxx": process with PID 88950 is still running

I THINK IT's the detached flag that is causing this... ??? (sorry for the brain dump. I keep playing wtih it and understanding more and more and maybe its useful to you...maybe not LOL)

Note that --watch results in multiple terminal windows (as stated in the docs somewhere) : docker terminal and my zsh terminal window. After compose down, the docker terminal is still there with the last message from the UP command. If I delete that terminal manually (using the trashcan icon) then it releases and I can run docker compose watch

Update: I feel like I'm going a little crazy. After manually clsoing the docker terminal, then running with up --watch again, there is no docker terminal. Just my main zsh. I hate inconsistent outcomes. I've done this many times and this is a new behavior.

With docker compose watch, docker compose down properly releases and gets me back to "Terminal will be reused by tasks, press any key to close it." in the docker terminal. However after pressing any key , I still have to Ctrl-C (no message suggests that) to truly release the docker terminal and go back to zsh.

*Side question; I haven't been able to find any other guidance about using up --watch vs watch. Does it exist somewhere?