docker / for-mac

Bug reports for Docker Desktop for Mac
https://www.docker.com/products/docker#/mac
2.43k stars 116 forks source link

Docker Desktop for Mac file corruption with VirtioFS with heavy disk activity? #6690

Open jmichalicek opened 1 year ago

jmichalicek commented 1 year ago

Expected behavior

I do not believe this to be specific to npm install, it is just easy to replicate using npm. Running npm install or npm ci should be able to successfully install npm packages. This does work in multiple containers when using gRPC FUSE

Actual behavior

When only installing one or two things at a time, installation usually works. ie. npm install webpack, but the more things that are added, the more frequently failures occur which suggests this is related to the amount of disk activity. Every package says that it is corrupted. Because this works when using gRPC FUSE I do not believe it to be an npm issue.

With npm the issue manifests in multiple ways.

npm WARN tarball tarball data for supports-color@https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz (sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==) seems to be corrupted. Trying again.
npm WARN tarball tarball data for supports-color@https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz (sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==) seems to be corrupted. Trying again.

and

npm ERR! code ENOENT
npm ERR! syscall stat
npm ERR! path /app/.npm/_cacache/content-v2/sha512/95/8e/009e79ac1167a25e2ac0200d1cf520cf6ee081abc218b8354bf0f326d0c6763c4b36c8fbd103f27ae10057b99a7b0982ab0ccfeb98dadcf7f8b728942773
npm ERR! errno -2
npm ERR! enoent ENOENT: no such file or directory, stat '/app/.npm/_cacache/content-v2/sha512/95/8e/009e79ac1167a25e2ac0200d1cf520cf6ee081abc218b8354bf0f326d0c6763c4b36c8fbd103f27ae10057b99a7b0982ab0ccfeb98dadcf7f8b728942773'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent

npm ERR! A complete log of this run can be found in:
npm ERR!     /app/.npm/_logs/2023-01-18T15_13_37_389Z-debug-0.log

Information

Output of /Applications/Docker.app/Contents/MacOS/com.docker.diagnose check

Starting diagnostics

[PASS] DD0027: is there available disk space on the host?
[PASS] DD0028: is there available VM disk space?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0013: is the $PATH ok?
[PASS] DD0003: is the Docker CLI working?
[PASS] DD0038: is the connection to Docker working?
[PASS] DD0014: are the backend processes running?
[PASS] DD0007: is the backend responding?
[PASS] DD0008: is the native API responding?
[PASS] DD0009: is the vpnkit API responding?
[PASS] DD0010: is the Docker API proxy responding?
[SKIP] DD0030: is the image access management authorized?
[PASS] DD0033: does the host have Internet access?
[PASS] DD0018: does the host support virtualization?
[PASS] DD0001: is the application running?
[PASS] DD0017: can a VM be started?
[PASS] DD0016: is the LinuxKit VM running?
[PASS] DD0011: are the LinuxKit services running?
[PASS] DD0004: is the Docker engine running?
[PASS] DD0015: are the binary symlinks installed?
[PASS] DD0031: does the Docker API work?
[PASS] DD0032: do Docker networks overlap with host IPs?
No fatal errors detected.

Steps to reproduce the behavior

If it succeeds, rm -rf /app/.npm /app/node_modules and use npm install or npm ci

This is the Dockerfile I used for testing which is a very stripped down version of what I usually work with. I have tested this with multiple base docker images including the latest ubuntu and python 3.11 images. Everything works fine with gRPC FUSE and also worked perfectly using VirtioFS when it was a beta feature.

If you just do a simple npm install webpack, etc. then usually everything works. The more packages in Package.json or specified on the command line, the more failures occur.

# syntax=docker/dockerfile:1.3
ARG PYTHON_VERSION=3.11.0
ARG DISTRO=bullseye
# Currently pinned at <22.0 due to https://github.com/jazzband/pip-tools/issues/1558
# when that is resolve we can just set this to an empty string safely, I think
ARG PIP_VERSION_PIN=<22.4

FROM ubuntu AS base
ENV npm_config_cache=/app/.npm/

RUN DEBIAN_FRONTEND=noninteractive apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
  software-properties-common \
  sudo \
  vim \
  telnet \
  apt-transport-https \
  lsb-release \
  git-completion \
  bash-completion \
  less \
  curl \
  && apt-get autoremove && apt-get clean

# See https://nodejs.org/en/about/releases/ for picking node versions
RUN curl -sL https://deb.nodesource.com/setup_18.x | bash
RUN DEBIAN_FRONTEND=noninteractive apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \
  nodejs \
  && apt-get autoremove && apt-get clean

RUN useradd -ms /bin/bash developer && echo "developer ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers

WORKDIR /app
RUN chown developer /app
USER developer

ENV HOME=/home/developer \
  PATH=/home/developer/.local/bin:/app/app/frontend/node_modules/.bin:$PATH \
  LC_ALL=C.UTF-8 \
  LANG=C.UTF-8

Package.json

{
  "dependencies": {
    "@date-io/date-fns": "^2.16.0",
    "@testing-library/react": "^13.4.0",
    "ansi-styles": "^6.2.1",
    "jest": "^29.3.1",
    "react": "^18.2.0",
    "react-dom": "^18.2.0",
    "webpack": "^5.75.0",
    "webpack-dev-server": "^4.11.1"
  }
}
jmichalicek commented 1 year ago

Just tested on 4.17.0, still super broken :(

apaniel commented 1 year ago

I have the same issue running webpck inside a docker in Mac, I can confirm 4.17.0 is still broken. gRPC FUSE does not have this problem, but it has a poor performance, a weback build with VirtioFS would take 30 seconds, while with gRPC FUSE 400 seconds.

dy-dx commented 1 year ago

@jmichalicek I believe I have the same issue but I cannot reproduce it using your example. I think downgrading Docker Desktop to 4.14.1 may help. Could you see if that works?

martinml commented 1 year ago

I can consistently reproduce in an internal project when npm is configured to store its cache in a VirtioFS shared volume. When switching to gRPC FUSE, the problem disappears.

(...)
npm WARN tarball tarball data for string-width@https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz (sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==) seems to be corrupted. Trying again.
npm WARN tarball tarball data for string-width@https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz (sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==) seems to be corrupted. Trying again.
npm WARN tarball tarball data for string-width@https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz (sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==) seems to be corrupted. Trying again.
npm WARN tarball tarball data for string-width@https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz (sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==) seems to be corrupted. Trying again.
npm WARN tarball tarball data for string-width@https://registry.npmjs.org/string-width/-/string-width-4.2.3.tgz (sha512-wKyQRQpjJ0sIp62ErSZdGsjMJWsap5oRNihHhu6G7JVO/9jIB6UyevL+tXuOqrng8j/cxKTWyWUwvSTriiZz/g==) seems to be corrupted. Trying again.
npm WARN tarball tarball data for js-yaml@https://registry.npmjs.org/js-yaml/-/js-yaml-4.1.0.tgz (sha512-wpxZs9NoxZaJESJGIZTyDEaYpl0FKSA+FB9aJiyemKhMwkxQg63h4T1KJgUGHpTqPDNRcmmYLugrRjJlBtWvRA==) seems to be corrupted. Trying again.
npm WARN tarball tarball data for js-yaml@https://registry.npmjs.org/js-yaml/-/js-yaml-4.1.0.tgz (sha512-wpxZs9NoxZaJESJGIZTyDEaYpl0FKSA+FB9aJiyemKhMwkxQg63h4T1KJgUGHpTqPDNRcmmYLugrRjJlBtWvRA==) seems to be corrupted. Trying again.
npm ERR! code ENOENT
npm ERR! syscall rename
npm ERR! path /app/.npm-cache/_cacache/tmp/66556a64
npm ERR! dest /app/.npm-cache/_cacache/content-v2/sha512/b1/1b/b592970ef722ed63104abea7d37a1f4acd91303b7493c97d474fee02683cc2e87a5319884884f2338fd5ee294eca603c2769e87985c3b08f2d50b89cc13c
npm ERR! errno -2
npm ERR! enoent ENOENT: no such file or directory, rename '/app/.npm-cache/_cacache/tmp/66556a64' -> '/app/.npm-cache/_cacache/content-v2/sha512/b1/1b/b592970ef722ed63104abea7d37a1f4acd91303b7493c97d474fee02683cc2e87a5319884884f2338fd5ee2964eca603c2769e87985c3b08f2d50b89cc13c'
npm ERR! enoent This is related to npm not being able to find a file.
npm ERR! enoent
whisper0077 commented 1 year ago

I experienced this bug with Docker Desktop 4.17.0 + VirtioFS, but it did not occur with version 4.18.0 Please try upgrading to version 4.18.0

jmichalicek commented 1 year ago

I missed the activity going on here, but I just tested with Docker Desktop 4.19.0 + VirtioFS and my first run of heavy disk usage worked great.

cdbennett commented 1 year ago

Not sure if this is the same issue you all are talking about, but when I run the Spring Cloud Config Server in a Docker container (using virtiofs), and I modify some config files on the host (which are mounted into the config server container), the container randomly sees corrupted and partial content files for a while. Even waiting a few seconds is not enough for the container to see the correct file content.

Discussed in https://github.com/docker/roadmap/issues/7#issuecomment-1641132358

If I disabled the virtiofs support in Docker Desktop, the problem goes away and things work correctly. (Though many other common operations are slower of course, due to the inefficient file sync needed.)

fredericdalleau commented 11 months ago

@cdbennett if you still see the problem, could you provide a simplified reproduction with more details, in a separate issue please?

cdbennett commented 11 months ago

If I get a chance, I'll try to create a reproducible test case.

cdbennett commented 10 months ago

@fredericdalleau - I tried to write a minimal reproduction case but wasn't able to break it with that. So far only the full Docker Compose for our proprietary application causes the issue with virtiofs (it is 100% reproducible in this case). If I ever am able to make a minimal test case I'll follow up with a new issue.

nrhope commented 2 months ago

I also experienced this problem but instead of npm with a mount in the container to a directory on host mac book built using maven, with VirtioFS and Docker 4.30.0. As VirtioFS is currently the default for fresh Docker installs, maybe the default should change to gRPC FUSE (at least for Mac Intel) until this issue is fixed?