Closed andyneff closed 4 years ago
fedora:31
base image, and everything behaves the exact same....Everything is the same.
It sounds like there's some leftover state, somewhere, that's interfering. I'll test on a 5.4 kernel to see if I can reproduce (I'm running 4.19). Other things that I would recommend that you test:
erichough/nfs-server
, if it's running, then check for any lingering NFS-related processes. e.g. ps aux | grep -iE "rpc|nfs"
. Anything interesting show up?docker system prune -a --volumes
. Does that change the behavior?
mount.nfs: trying text-based options 'vers=4.2,addr=10.XX.XX.59,clientaddr=10.XX.XX.59'
mount.nfs: mount(2): No such file or directory
That sounds like the container can't locate/access the /nfs
directory. You could probably snoop around in your host's /var/lib/docker
directory to see if anything in there looks suspect. Or maybe compare it to /var/lib/docker2
to see if anything stands out?
It does "appear" as if the /var/lib/docker
version cannot access my /nfs
, however the workaround shows that if I mount with the container from /var/lib/docker2
, stop that container, and start the container from /var/lib/docker
, that now it can access it.
I stop the container, and rm the exited container to make sure I remove what I normally know as state.
I've tried pruning everything, no success. I moved on and I just deleted 100% of all images, containers, networks (except the three, bridge, host, and none), and build cache, re-pulled the image, and it STILL does not work. Even though this nfs container is the Only problem I have, I'm beginning to suspect /var/lib/docker
is corrupt somehow. Now that I've deleted everything, I can compare it to a clean one and see what is different.
I'll let you know if/when I find anything else out
@ehough I think I found out what is different
Wow... It turns out we had a very similar conversation a year ago (#2), and I totally forgot about it (and probably messed up my docker-compose file back then...)
/var/lib/docker
is ext4
/var/lib/docker2
is btrfs
There was nothing corrupt in /var/lib/docker
, it just has different rules for different fs?
So I guess, in a way we learned a little more about this issue.
If my docker daemon data dir is btrfs, fsid=1
and fsid=0
work
/nfs
If my docker daemon data dir is ext4+overlay2, only fsid=0
works
/nfs
On the host, this can be checked via docker info -f '{{json .Driver}}'
, in the ext4 cast this is overlay2
and docker info -f '{{index .DriverStatus 0 1}}'
says extfs
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
In the container...
mount | sed -En 's:^.* on / (type |\()([^, ]*).*:\2:p'
Will return btrfs
for btfrs, and overlay
for the ext4+overlay2 case
/dev/sdc3 on / type btrfs (rw,seclabel,relatime,ssd,space_cache,subvolid=6485,subvol=/root/var/lib/docker2/btrfs/subvolumes/f905ea7d87 ...
vs
overlay on / type overlay (rw,seclabel,relatime,lowerdir=/var/lib/docker/overlay2/l/BHFRM7TBEU ...
So.... Assuming someone can replicate this result, If you are so inclined, you could add a check init_exports
to check if overlay, and print out a warning if fsid=0
is not set?
Keep in mind., I have no idea what any of the other storage drivers need, aufs
, etc....
Good find! I also completely forgot our convo from last year. That tricky fsid
parameter strikes again.
So.... Assuming someone can replicate this result, If you are so inclined, you could add a check init_exports to check if overlay, and print out a warning if fsid=0 is not set?
That's a great idea. I'll open a new ticket to track this.
Thanks again for your investigative work, and please don't hesitate to reach out again if you hit other obstacles.
I've run into a strange problem with this docker, and I've yet to figure out what is going on. For some strange reason I can no longer successfully use this docker (when I could several kernels ago. I'm still using the same docker-compose I had ~ a year and a half ago).
I'm out of ideas, I cannot figure out how the same image (the SHAs match) can work with one docker data directory and not the other. What am I missing to make this container work?
Observations
Not working
When I run it in my normal docker data folder I get:
Using
/var/lib/docker
:Clean docker dir Working
However, I accidentally discovered that if I change the docker data dir, and restart the daemon, it works all of a sudden 😮
Using
/var/lib/docker2
Here is the
docker-compose
file I'm usingOther things I tried that did not work
docker run -p 2049:2049 --cap-add SYS_ADMIN --cap-add SYS_MODULE -v /opt/nfs_test:/nfs:rw -v /lib/modules:/lib/modules:ro -e NFS_EXPORT_0='/nfs *(rw,insecure,no_subtree_check,fsid=1,no_root_squash,async)' -e NFS_DISABLE_VERSION_3=1 erichough/nfs-server
--privileged
erichough/nfs-server
imagedocker
version of the command didn't even use.network=host
and127.0.0.1
NFS_DISABLE_VERSION_3
, but that doesn't really work becausesystemd
uses port 111, so that messes uprpc-statd
A workaround
Now this is surprising, but I discovered that I could:
/var/lib/docker2
/mnt/data
. Success/var/lib/docker
/var/lib/docker
ls /mnt/data
works using the/var/lib/docker
now.This is not a great work around, but it seems to me to present even more questions than answers.
Other notes
modprobe
on the host before running the container. I suspect that this is due to me having a newer kernel, but have not tested this yet. I'm beginning to wonder that the kernel the image is built with is too different from my kernel to runmodprobe
command.