Closed pedantic-git closed 1 year ago
I can reproduce this on current Manjaro (5.6.12-1-MANJARO
), getting the exact same output.
docker version
output:
Client:
Version: 19.03.8-ce
API version: 1.40
Go version: go1.14.1
Git commit: afacb8b7f0
Built: Thu Apr 2 00:04:36 2020
OS/Arch: linux/amd64
Experimental: false
Server:
Engine:
Version: 19.03.8-ce
API version: 1.40 (minimum version 1.12)
Go version: go1.14.1
Git commit: afacb8b7f0
Built: Thu Apr 2 00:04:16 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.3.4.m
GitCommit: d76c121f76a5fc8a462dc64594aea72fe18e1178.m
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
docker info
output:
Client:
Debug Mode: false
Server:
Containers: 6
Running: 0
Paused: 0
Stopped: 6
Images: 37
Server Version: 19.03.8-ce
Storage Driver: overlay2
Backing Filesystem: <unknown>
Supports d_type: true
Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: d76c121f76a5fc8a462dc64594aea72fe18e1178.m
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.6.12-1-MANJARO
Operating System: Manjaro Linux
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.53GiB
Name: mainframe
ID: 2Y5C:436L:YDBV:YUH6:AW2M:EJ56:TIUI:I4ZO:ACZF:M64N:OQAG:LQBQ
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Both Arch linux and Manjaro are distributions that are not officially supported. Packages distributed by those distributions are maintained by those distributions and use differences in both the build process, build-time dependencies, as well as different versions of (e.g.) containerd and runc.
I'd recommend opening a ticket in the corresponding distribution's issue tracker for that reason.
I'm closing this issue because of the above, but feel free to continue the conversation
@pedantic-git have you resolved this? did you create an issue elsewhere that I can follow? I'm seeing the same issue with the official ruby 2.6 docker image on an up to date manjaro install (updating an existing install doesnt seem to trigger the error)
@kthibodeaux I didn't create an issue anywhere else because I didn't have time to test it with the official build of Docker - have you been able to do that? It seems like even though it's happening to everyone on Arch/Manjaro it won't be supported here if it's only the Arch build of Docker.
Did either of you get anywhere with this @pedantic-git @kthibodeaux? Got another example here, also on Arch.
Sadly not. I never got past the "Arch isn't supported" message above. I guess taking it up with the maintainers of the Arch package is the way to go.
Looks like that's Morten Linderud foxboron@archlinux.org if you'd like to raise a case and copy me in (quinn at fishpercolator.co.uk) but I'm not really even sure how one goes about doing that.
I'm able to reproduce this on Fedora 33 using the package moby-engine-19.03.13-1.ce.git4484c46.fc33.x86_64
.
I get the exact same output as @pedantic-git
My docker info
:
awatts@ironic-mullet reproduce-copystream [master=] $ docker info
Client:
Debug Mode: false
Server:
Containers: 23
Running: 21
Paused: 0
Stopped: 2
Images: 64
Server Version: 19.03.13
Storage Driver: overlay2
Backing Filesystem: btrfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: journald
Cgroup Driver: systemd
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: /usr/libexec/docker/docker-init
containerd version:
runc version: c9a9ce0286785bef3f3c3c87cd1232e535a03e15
init version:
Security Options:
seccomp
Profile: default
Kernel Version: 5.10.8-200.fc33.x86_64
Operating System: Fedora 33 (Workstation Edition)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.09GiB
Name: ironic-mullet.eminorinc.com
ID: SHFE:UOOW:MKUA:56MY:UHUL:QTQP:YZJX:C43V:XR4R:67RF:BWKS:22EM
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
@thaJeztah Now that this issue can be reproduced in an official build (albeit of Moby Engine rather than CE) is it possible this issue could warrant reopening?
@thaJeztah any chance to reopen this issue?
Same issue here(on ubunto 20.04), running images Ubuntu:18:04, Ubuntu:20.04, and Ruby:3.0.0.
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.0-docker)
Server:
Containers: 40
Running: 6
Paused: 0
Stopped: 34
Images: 485
Server Version: 20.10.1
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.8.0-43-generic
Operating System: Ubuntu 20.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 125.7GiB
Name: alexvko
ID: N5L3:NVEW:VSYS:OSK6:JWAO:T6ZE:APR5:5XVM:OTMQ:BXSZ:DNBV:STUU
Docker Root Dir: /mnt/md0/docker
Debug Mode: false
Username: alexvko
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No blkio weight support
WARNING: No blkio weight_device support
Same here on Fedora 33
I just encountered the same issue and hope that it will be reopened and addressed here as it was far from easy to debug.
In the meantime I'll have to add this (Ruby) monkey patch to my docker setup:
module FileUtils
class Entry_
def copy_file(dest)
Open3.capture2('cp', path(), dest)
end
end
end
@makmic There's actually no need to shell out to a separate and heavyweight cp
binary to work around this issue. Even doing a no-op chmod or utime on the file before it is copied as in my example prevents this issue from occurring. Why, I have no idea, but it reliably does!
I'm facing to the same issue on CentOS.
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
Server:
Containers: 24
Running: 22
Paused: 0
Stopped: 2
Images: 51
Server Version: 20.10.5
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 4.18.0-240.15.1.el8_3.x86_64
Operating System: CentOS Linux 8
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 15.65GiB
Name: LMS-Server
ID: DRNO:6ONC:XO6P:6ZAK:W6NI:R764:GT5Q:YTZD:QOEB:Y63W:JD6N:TW4Y
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
@makmic There's actually no need to shell out to a separate and heavyweight
cp
binary to work around this issue. Even doing a no-op chmod or utime on the file before it is copied as in my example prevents this issue from occurring. Why, I have no idea, but it reliably does!
@pedantic-git I don't suppose you have an example no-op chmod or utime that's effective? and do you think they'd be faster than my current workaround?
module FileUtilsPatch
def copy_file(dest)
FileUtils.touch(path())
super
end
end
module FileUtils
class Entry_
prepend FileUtilsPatch
end
end
Could this be an interaction with LUKS Full Disk Encryption? That's the main disk related difference on the affected system I have, when the same project is fine on all others.
@dwarburt Have you seen the example repo in the original post? The last two tests are the no-op ones. They're basically the same as your touch
example apart from that they set the times/mode back to their original values rather than updating them to new values. Hope that makes sense!
https://www.github.com/fishpercolator/reproduce-copystream/tree/master/repro.rb
I tried the reproduct-copystream in my old laptop and could not reproduce the bug. But I ran the same in my new laptop and then bug appears.
Working config :
docker info
Client:
Debug Mode: false
Server:
Containers: 23
Running: 6
Paused: 0
Stopped: 17
Images: 400
Server Version: 19.03.1
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 693
Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-126-generic
Operating System: Ubuntu 18.04.5 LTS
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 7.676GiB
Name: tim-XPS-13
ID: 5MUF:2FRA:A7CV:Z4QN:5TYH:PB3S:U4S7:SHP6:ARHK:UXPL:HBQ6:HJM7
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: tzmachine
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: the aufs storage-driver is deprecated, and will be removed in a future release.
Failing config :
docker info
Client:
Debug Mode: false
Server:
Containers: 7
Running: 7
Paused: 0
Stopped: 0
Images: 20
Server Version: 19.03.13
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175
runc version:
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.8.0-50-generic
Operating System: Ubuntu Core 18
OSType: linux
Architecture: x86_64
CPUs: 12
Total Memory: 15.27GiB
Name: pierrem-XPS-15-9500
ID: 6CXC:YEKH:QWH2:NJ7Z:A566:IVCN:HSEQ:4KTV:MSCN:2NJU:U2GU:ZWDX
Docker Root Dir: /var/snap/docker/common/var-lib-docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
This was also failing with Docker version 20.10.6, build 370c289 in my new laptop.
So I thought it might be a regression appeared after 19.13.1, but a team mate made the test and does not have the bug with 20.10.6. So it's not only related to a docker version.
working config :
reproduce-copystream β docker info (master)
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
Server:
Containers: 15
Running: 0
Paused: 0
Stopped: 15
Images: 47
Server Version: 20.10.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e
runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 4.15.0-29-generic
Operating System: Linux Mint 18.3
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.18GiB
Name: T480
ID: U56P:745X:PPSO:XKPL:W5SN:T3EC:FZVJ:JQX6:DFEQ:OO5M:4LTG:A7CC
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Several our Bitbucket Pipelines customers are facing the same issue after our FlatCarOS upgrade 2605.12.0 to 2765.2.2 (kernel move from 5.4.92 directly to 5.10.25)
it mostly appears as ActiveStorage::IntegrityError
while running rspec tests
docker info
Client:
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 20.10.5
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Authorization: pipelines
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
WARNING: API is accessible on http://0.0.0.0:2375 without encryption.
Access to the remote API is equivalent to root access on the host. Refer
to the 'Docker daemon attack surface' section in the documentation for
more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface
Init Binary: docker-init
containerd version: 269548fa27e0089a8b8278fc4fc781d7f65a939b
runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec
init version: de40ad0
Security Options:
seccomp
Profile: default
userns
Kernel Version: 5.10.25-flatcar
Operating System: Alpine Linux v3.13 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 30.91GiB
Name: 44212645-d3ba-4afa-b2d9-8a1f81e7f1ee-wrmh2
ID: KPMN:TGRY:QL5G:YYZT:4ZHE:XLQD:BCPI:UEUZ:6KD7:TGB4:GUVN:KAIB
Docker Root Dir: /var/lib/docker/165536.165536
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
http://10.203.197.49:5000/
Live Restore Enabled: false
Product License: Community Engine
I ran into this same problem with ruby's FileUtil.cp
on MacOS AMD64 using ruby:2.5.3
as the base image for the container.
touch
ing the file after it was copied worked to get around it.
I was also able to work around it without changing source or spec code by mounting the /tmp
directory (the destination of the file copied from a mounted directory) as a tmpfs
upon container startup (could also be set through the tmpfs
key in a docker-compose file):
--mount type=tmpfs,destination=/tmp
Reopening as this is being seen in supported configurations.
Seeing this on docker for Mac as well. Very strange. I can reproduce on one image, but not another.
I first came across this issue on Arch on 2020-05-28 in a failing RSpec test using ActiveStorage, returning an ActiveStorage::IntegrityError
.
@maddymarkovitz and I did much investigation on this in the following months. With strace
, we discovered it was related to the copy_file_range
syscall, as recently noted by @amanfredi in the Docker for Mac issue. By testing many different kernel versions in Qemu, we determined it to be a bug in the Linux kernel that was introduced in Linux 5.6.0-arch1-1
.
The bug probably resides in the overlay2 subsystem, as it was changed in that release, and from my results of trying out different Docker storage drivers:
overlay2
: demonstrates bugoverlay
: demonstrates bugvfs
: works, but BuildKit does notdevicemapper
(loop-lvm
mode): works, but BuildKit does notEnabling/disabling overlay2
's 'Native Overlay Diff' didn't change the outcome.
We put together a minimal C file reproducing the bug by executing the copy_file_range
syscall, and a test script that compiles and runs it, copying from a mounted file (resulting in failed copying due to the bug) and copying from a previously copied-in file (resulting in successful copying).
#include <unistd.h>
#include <sys/syscall.h>
#include <fcntl.h>
int main(int argc, char *argv[]) {
int input = open(argv[1], O_RDONLY);
int temp1 = open("/tmp/copy_file_range_test", O_WRONLY|O_CREAT|O_TRUNC, 0100644);
syscall(SYS_copy_file_range, input, NULL, temp1, NULL, 6, 0);
close(input);
close(temp1);
int temp2 = open("/tmp/copy_file_range_test", O_RDONLY);
int dest = open("destination", O_WRONLY|O_CREAT|O_TRUNC, 0100644);
syscall(SYS_copy_file_range, temp2, NULL, dest, NULL, 6, 0);
close(temp2);
close(dest);
}
#!/bin/bash
set -Eeuo pipefail
source=${1-Gemfile}
check() {
if wc -c destination | grep -E '^0 ' > /dev/null; then
echo 'Copy failed'
else
echo OK
fi
echo
}
printf "FROM debian:10.8-slim\nRUN apt update && apt install -y gcc strace" > /tmp/strace_Dockerfile
docker build -f /tmp/strace_Dockerfile -t strace .
echo
echo Local:
(
# set -x
gcc test.c
strace ./a.out "$source" &> test-strace
)
check
echo Docker - mounted:
docker run --rm -v "$(pwd):/work" -w /work strace bash -c "
#set -x &&
gcc test.c &&
strace ./a.out '$source' &> test-strace-docker-mounted
"
check
echo Docker - copied:
docker run --rm -v "$(pwd):/work" -w /work strace bash -c "
#set -x &&
cp '$source' /x &&
gcc test.c &&
strace ./a.out /x &> test-strace-docker-copied
"
check
Bug output:
β ./ctest.sh
[+] Building 2.3s (6/6) FINISHED
=> [internal] load build definition from strace_Dockerfile 0.0s
=> => transferring dockerfile: 115B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/debian:10.8-slim 2.2s
=> [1/2] FROM docker.io/library/debian:10.8-slim@sha256:13f0764262a064b2dd9f8a828bbaab29bdb1a1a0ac6adc8610a0a5f37e514955 0.0s
=> CACHED [2/2] RUN apt update && apt install -y gcc strace 0.0s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:93c8b4395b5e2b3d5e0e7510d18201c4d94afab419415b5541b8852781ed0442 0.0s
=> => naming to docker.io/library/strace 0.0s
Local:
OK
Docker - mounted:
Copy failed
Docker - copied:
OK
We had meant to report the bug to the Linux project, but never quite got around to it. Has anyone else done so already?
Running the script today while writing this (on 5.12.2-arch1-1
), I am surprised to see that it actually passes! Maybe Linux fixed the bug recently...
Local:
OK
Docker - mounted:
OK
Docker - copied:
OK
Our colleagues using Mac were not seeing this issue at the time. Now however, Mac people would be seeing this bug too, due to Docker for Mac updating the Linux kernel version used in the VM (to a version above Linux 5.6.0). In Docker for Mac 3.0.0 the Linux kernel sees a massive upgrade, to 5.10.25.
Our team also found a comment in Rust stdlib about the issue of copy_file_range
returns 0 and they mentioned overlay.
This is the related PR: https://github.com/rust-lang/rust/pull/75428
This is the related issue: https://github.com/rust-lang/rust/issues/75446
Maybe it would help to identify the impact of the issue
Our team using both Mac and Linux have been seeing this issue while using test-kitchen inside docker containers to test our chef cookbooks. As @ZimbiX mentioned the bug seems to have been introduced in 5.6 kernel, it however did not seem to be present in kernel versions >= 5.11. Users originally saw the following errors which I am including below incase anyone else hits them as this issue was pretty rough to track down.
Chef::Exceptions::JSON::ParseError:
Top level JSON object must be a Hash or Array. (actual: NilClass)
[2021-05-05T19:35:36+00:00] INFO: HTTP Request Returned 404 Not Found: Object not found:
resolving cookbooks for run list: []
[2021-05-05T19:35:37+00:00] ERROR: #<TypeError: no implicit conversion of nil into Hash>
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-zero-5.3.2/lib/chef_zero/chef_data/cookbook_data.rb:76:in `merge!'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-zero-5.3.2/lib/chef_zero/chef_data/cookbook_data.rb:76:in `from_json'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-zero-5.3.2/lib/chef_zero/chef_data/cookbook_data.rb:34:in `metadata_from'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.21.14/lib/chef/chef_fs/chef_fs_data_store.rb:828:in `get_single_cookbook_version'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-12.21.14/lib/chef/chef_fs/chef_fs_data_store.rb:550:in `list'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-zero-5.3.2/lib/chef_zero/data_store/v1_to_v2_adapter.rb:83:in `block in list'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-zero-5.3.2/lib/chef_zero/data_store/v1_to_v2_adapter.rb:113:in `fix_exceptions'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-zero-5.3.2/lib/chef_zero/data_store/v1_to_v2_adapter.rb:82:in `list'
/opt/chef/embedded/lib/ruby/gems/2.3.0/gems/chef-zero-5.3.2/lib/chef_zero/rest_base.rb:100:in `list_data'
While test-kitchen was staging the files to be sent to the remote vm, it created a bunch of 0-byte cookbook files and json files. Depending on what we we're testing we hit the above different errors.
We ended up using the following monkeypatch in our testing environment's docker container to workaround the issue.
# frozen_string_literal: true
require 'fileutils'
module FileUtils
class Entry_
def copy_file(dest)
File.open(path()) do |s|
File.open(dest, 'wb', s.stat.mode) do |f|
IO.copy_stream(s, f)
f.chmod f.lstat.mode
end
end
end
end
end
We also put guards around it so it would only be loaded on the problematic kernels, so in the future in case we forgot about it... it wouldn't bite us...
Perhaps a typo above? i couldn't get the above patch to work but was able to by doing the no-op chmod on the source s
IO before the copy stream. So like this:
s.chmod s.lstat.mode
IO.copy_stream(s, f)
The fixed it for both our Github Action based test running in a ruby:2.7.3-alpine
on their unbuntu:latest
runner as well as in the https://github.com/fishpercolator/reproduce-copystream checkout on MacOS 11.3.1 intel MBP running Docker Desktop 3.3.3. Interestingly, I wasn't previously able to reproduce this bug running our apps tests inside a container on that machine. Maybe due to us mounting a tmpfs from docker-compose. We dont do that in GH actions
My team reports the same issue on Ubuntu Linux and Mac OS hosts, with the official Docker release.
@amichal No, that was not a typo it did not seem to resolve the issue when I used your snippet.
This is on OSX 11.4 inside a ruby:2.6.0 container which is based off of buildpack-deps:buster.
To future people reading this: We had this same issue on Ubuntu 20.04 with kernel 5.4.0-89, downgrading to 5.4.0-88 helped.
I'm hitting this with the ruby:bullseye
Docker image. From time to time FileUtils.cp(from, to)
(this is Ruby) creates an empty destination file when copying from /tmp
to a mounted volume.
I'm hitting this on macOS but the https://github.com/docker/for-mac/issues/5570 issue is closed π€·ββοΈ
I'm using Docker Desktop 4.1.1 (69879).
Within the container,
# uname -a
Linux 81347a0746bd 5.10.47-linuxkit #1 SMP Sat Jul 3 21:51:47 UTC 2021 x86_64 GNU/Linux
Mounting /tmp
with tmpfs
works around the problem for me
I first validated that Linux v5.10 did NOT pass the reproduction test in https://github.com/docker/for-linux/issues/1015#issuecomment-841915668:
root@stanhu-kernel-test2:/tmp$ uname -a
Linux stanhu-kernel-test2 5.10.0 #3 SMP Wed Dec 1 16:42:21 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@stanhu-kernel-test2:/tmp# bash test.sh
Sending build context to Docker daemon 14.34kB
Step 1/2 : FROM debian:10.8-slim
---> 115566c891d1
Step 2/2 : RUN apt update && apt install -y gcc strace
---> Using cache
---> d562d525045d
Successfully built d562d525045d
Successfully tagged strace:latest
Local:
OK
Docker - mounted:
Copy failed
Docker - copied:
OK
Then I applied the following kernel patch, and it now passes:
https://lore.kernel.org/linux-fsdevel/20201207163255.564116-6-mszeredi@redhat.com/#t
root@stanhu-kernel-test2:/tmp# uname -a
Linux stanhu-kernel-test2 5.10.0+ #5 SMP Wed Dec 1 20:58:28 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
root@stanhu-kernel-test2:/tmp# bash test.sh
Sending build context to Docker daemon 14.34kB
Step 1/2 : FROM debian:10.8-slim
---> 115566c891d1
Step 2/2 : RUN apt update && apt install -y gcc strace
---> Using cache
---> d562d525045d
Successfully built d562d525045d
Successfully tagged strace:latest
Local:
OK
Docker - mounted:
OK
Docker - copied:
OK
This patch landed in Linux v5.11 via this merge commit: https://github.com/torvalds/linux/commit/92dbc9dedccb9759c7f9f2f0ae6242396376988f (commit: https://github.com/torvalds/linux/commit/82a763e61e2b601309d696d4fa514c77d64ee1be)
I've reported the issue on the mailing list and asked whether this patch could be backported: https://marc.info/?l=linux-fsdevel&m=163839424404401&w=2.
Sent a request for backporting the fixes into Linux 5.10: https://lore.kernel.org/stable/CAMBWrQ=1MKxnMT_6Jnqp_xxr7psVywPBJc6p1qCy9ENY8RF2Qw@mail.gmail.com/T/#u
I've validated that torvalds/linux@1a980b8cbf0059a5308eea61522f232fd03002e2 introduced this regression in Linux v5.6.x.
Thank you, @stanhu !
Per the latest update on the kernel stable mailing list, the kernel backport fix for 5.10 has been queued for review and should land in the stable-5.10.y
branch soon.
To avoid this bug, I'd suggest avoid Linux v5.6.0 - v5.10.83 for now, unless you backport the two patches below:
I've confirmed that 5.5.19 does NOT have this bug, and it was introduced in 5.6.0 via torvalds/linux@1a980b8cbf0059a5308eea61522f232fd03002e2.
I looked into submitting backports for the 5.6 to 5.9 branches, but it appears they are no longer maintained.
UPDATE: Linux 5.10.84 has been tagged with the backported fixes. Other distributions that use older kernel versions are also still affected. For example:
It seems some distributions that run a 5.4 kernel, such as Ubuntu 18.04.6, are also affected because they have pulled in https://github.com/torvalds/linux/commit/1a980b8cbf0059a5308eea61522f232fd03002e2:
stanhu@stanhu-ubuntu-18:/tmp$ grep ovl_splice linux-gcp-5.4-5.4.0/fs/overlayfs/file.c
static ssize_t ovl_splice_read(struct file *in, loff_t *ppos,
ovl_splice_write(struct pipe_inode_info *pipe, struct file *out,
.splice_read = ovl_splice_read,
.splice_write = ovl_splice_write,
I've reported this to Ubuntu in https://bugs.launchpad.net/ubuntu/+source/linux-base/+bug/1953199. I'd suggest others run the reproduction step in https://github.com/docker/for-linux/issues/1015#issuecomment-841915668 and report the bugs to the distribution maintainers.
CentOS 8 also has this bug even though it's shipping with a 4.18 kernel. I reported it here: https://bugs.centos.org/view.php?id=18370
RedHat Enterprise Linux 8.3 also has this problem: https://bugzilla.redhat.com/show_bug.cgi?id=2028998
Linux v5.10.84 has now been tagged with the fixes for the overlay filesystem:
Thanks for the update, @stanhu - we added an internal ticket to either patch the kernel version as used by Docker Desktop, or to wait for the upstream kernel to include the fix; having it merged in upstream definitely makes it easier π
/cc @fredericdalleau @djs55 FYI
Hi @thaJeztah,
I see that an internal ticket was opened to fix this kernel issue. I just wanted to ask if this has been planned and if there's an ETA? I've been holding off on updating Docker for Mac for a long time now (due to this issue, which is caused by the underlying kernel problem), and would love to be able to update without having to find a workaround.
Thanks heaps for your help!
@smartygus Kernel 5.10.105 has been released fixing this bug! Docker for Mac released 4.6.0 version that includes Kernel 5.10.104. I confirm this version solves the problem in Docker for Mac!
@emanuelhfarias this is great news!! Thanks for the heads up! :)
Yes, I think we can close this issue. I see that upgrading Docker for Mac to v4.6.0 upgraded the kernel to 5.10.104-linuxkit
from 5.10.76-linuxkit
.
Other related distribution notes:
RedHat: According to the bug report I filed, RedHat Enterprise 8 will see the fix in kernel-4.18.0-368.el8
. I've confirmed that as writen in the RHEL 8.5 release notes, kernel-4.18.0-348
is the latest version available as of this writing.
Google: Container OS 97 (https://cloud.google.com/container-optimized-os/docs/release-notes/m93) has upgraded to 5.10.90
Ubuntu: No updates. When in doubt, use Ubuntu 20.04 instead of 18.04.
According to https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8-beta/html/8.6_release_notes/new-features#enhancement_kernel, RedHat 8.6 will be released on May 22, 2022 with kernel-4.18.0-369
, so this should fix it. RedHat 8.5 still has this issue.
The story
A few weeks ago, a bunch of tests in Rails code I have started failing in my Docker development environment. My dev environment is Arch so the actual set of libraries and things installed is very much a moving target.
Lots of digging around led me to create a repo that can reproduce the issue using only Ruby (no gems at all), at the lowest level possible. And it's really really weird. More information below.
Expected behavior
When copying a file in Ruby using the
IO.copy_stream
method, the file should be an identical copy.Actual behavior
Under a very specific set of circumstances, the resulting file is 0 bytes.
I'm reasonably certain this is a Docker issue, because one of the conditions that needs to be true is that the source file is located on a mounted volume.
The conditions that need to be true for the
IO.copy_stream
operation to fail are:IO.copy_stream
. Copying it usingFile.write
andtempfile.read
does not cause the issue.Steps to reproduce the behavior
Please check out this repo: https://github.com/fishpercolator/reproduce-copystream and run the command in the README.
The output of this command on my env is:
As you can see, the file is copied successfully if it is:
If any of the 4 statements above are true, the file copies fine. If all 4 are false, the file is created but it has a 0-byte size.
I've checked to see if the source filesystem of the mounted volume makes any difference but I get the same effect with eCryptfs, ext4 and tmpfs mountpoints.
Likewise I've tried different versions of Ruby to see if it was a regression in Ruby, but I can still reproduce the issue with the official Docker images of much older versions where I was certain it worked.
Does anyone have any ideas? Can anyone else reproduce this using my repo?
Output of
docker version
:Output of
docker info
:Additional environment details (AWS, VirtualBox, physical, etc.)
It's a physical Dell XPS-13 9360 running Arch Linux. Arch doesn't have version numbers but all packages are up to date. Docker was installed from the Arch package with is built using this PKGBUILD script from the official Docker sources.