containers / docker-lvm-plugin

Docker volume plugin for LVM volumes
GNU Lesser General Public License v3.0
155 stars 64 forks source link

Auto-mount not working after server crash #24

Closed echa closed 6 years ago

echa commented 8 years ago

Hey guys, I just experienced a server crash and after reboot my existing lvm volumes never got auto-mounted again. Looking at the code this is to be expected.

Reason is that you trust information in /var/lib/docker-lvm-plugin/lvmCountConfig.json even after an unclean shutdown/crash. Counts were still set to 1 in my case since this was the last alive state (containers running, docker running, volumes mounted by kernel and exposed to containers).

After manually resetting all counters to 0 and restarting docker + docker-lvm-plugin services the auto-mount worked as expected.

System was a CentOS 7.2 + EPEL:

docker version

Client: Version: 1.10.3 API version: 1.22 Package version: docker-common-1.10.3-46.el7.centos.14.x86_64 Go version: go1.6.3 Git commit: cb079f6-unsupported Built: Fri Sep 16 13:24:25 2016 OS/Arch: linux/amd64

Server: Version: 1.10.3 API version: 1.22 Package version: docker-common-1.10.3-46.el7.centos.14.x86_64 Go version: go1.6.3 Git commit: cb079f6-unsupported Built: Fri Sep 16 13:24:25 2016 OS/Arch: linux/amd64

yum info docker-lvm-plugin

Installed Packages Name : docker-lvm-plugin Arch : x86_64 Version : 1.10.3 Release : 46.el7.centos.14 Size : 8.6 M Repo : installed From repo : extras Summary : Docker volume driver for lvm volumes URL : https://github.com/docker/docker License : LGPLv3 Description : Docker Volume Driver for lvm volumes.

capi1O commented 7 years ago

Hi, I have a similar problem (logical volumes are not mounted when their corresponding docker volumes are used) but it happens even after a clean reboot. As echa I then need to manually reset the counts in /var/lib/docker-lvm-plugin/lvmCountConfig.json for each docker volume and to restart docker-lvm-plugin and docker services (sudo systemctl restart docker-lvm-plugin and sudo systemctl restart docker).

Otherwise I have to manually mount the LV to the path specified in the docker volume. Ex for a docker volume named my-lv :

docker volume inspect my-lv :

[
    {
        "Name": "my-lv",
        "Driver": "lvm",
        "Mountpoint": "/var/lib/docker-lvm-plugin/my-lv",
        "Labels": {},
        "Scope": "local"
    }
]

sudo lvdisplay :

  --- Logical volume ---
  LV Path                /dev/docker-vg/my-lv
  LV Name                my-lv
 ...

After reboot the docker volume is not "linked" to the LV anymore (because LV is not mounted at the path specified by the docker volume) but it becomes a host directory mounted as a docker volume : data is stored on host in the dir at path /var/lib/docker-lvm-plugin/my-lv.

I need to manually mount the LV at this path to "link" back the LV to the docker volume : sudo mount /dev/docker-vg/my-lv /var/lib/docker-lvm-plugin/my-lv


From the info I found the purpose of the count property of lvmDriver (that is saved to disk for each docker volume/LV in this lvmCountConfig.json file) is to keep count of how many docker containers use this volume, so a docker volume cannot be not removed when at least 1 container is using it (as explained here https://github.com/docker/docker/issues/17585).

Docker requires the plugin to provide a volume, given a user specified volume name. This is called once per container start. If the same volume_name is requested more than once, the plugin may need to keep track of each new mount request and provision at the first mount request and deprovision at the last corresponding unmount request.

source : https://docs.docker.com/engine/extend/plugins_volume/

Volumes are removed explicitly (i.e., docker volume rm) or implicitly via container remove (i.e., docker rm -v). Docker will only send a remove request to a volume driver when the internal reference count drops to zero. Since Docker’s reference counting is not multi-host aware, the volume driver must be.

source : http://www.blockbridge.com/multi-host-volumes-semantics-with-docker-1-9/

But I don't see how it relates to this mounting problem. Plus I did not really get this :

these are files for persisting the state of the memory stores on disk

source : https://github.com/shishir-a412ed/docker-lvm-plugin/issues/15

shishir-a412ed commented 7 years ago

@echa @monkeydri There are multiple ways to solve this. Right now when the system reboots (clean restart or server crash) the logical volumes are not auto mounted after the restart. Based on the code this is expected. We can do the following to handle this:

1) If the user has executed systemctl enable docker-lvm-plugin, the plugin would restart automatically on reboot and load the state of the volumes from the config files {lvmCountConfig.json, lvmVolumesConfig.json} back in the memory. Based on those states, the plugin would then mount the lv's back (if they are not mounted).

2) Create entries in /etc/fstab so that the mount points are persisted across reboots. 3) Drop mount files for each volume in /etc/systemd/system and let systemd take care of the mount state. I don't like (3) since it would result in having too many mount files if there are a lot of LVM volumes.

I will discuss this with my tech lead and see how we can resolve this in the most efficient way possible.

Shishir

shishir-a412ed commented 7 years ago

@echa @monkeydri

I had a discussion with my technical lead, and we think option {1} would be best to solve this. We will let the daemon {docker-lvm-plugin} handle the state of the mounts on reboot.

I will create a PR to add this functionality.

/cc @rhatdan

Shishir

maartenl945 commented 6 years ago

We are experiencing the same issue. After server restart, the docker-lvm-plugin volumes do not get mounted correctly and therefore do not contain any data.

Is there any news on this issue ? Has any solution already been implemented or are there known workarounds ? Or is it better to just NOT use this plugin ?

echa commented 6 years ago

My workaround has been and still is to reset counters in lvmCountConfig.json to zero as extra step in docker-lvm-plugin's systemd launch script. I keep the last good version of the config file (with all volumes registered and counters at zero) around for copy. A smarter solution would be to have a script parse JSON and set counts to zero so you don't miss changes to the list of volumes.

maartenl945 commented 6 years ago

Thanks for that suggestion. We use Rancher for managing containers and volumes so it creates volumes for us depending on which containers we install. So the mounted volumes may indeed change over time. Your suggestion for parsing the JSON on startup might work in that scenario.

We also notice though that in that lvmCountConfig.json file more volumes are listed then still in use. When containers are deleted, the volumes are not correctly cleaned up apparently. But that is probably another issue.

shishir-a412ed commented 6 years ago

@echa @maartenl945 I will try to take a look at this sometime this week.

When containers are deleted, the volumes are not correctly cleaned up apparently. But that is probably another issue. AFAIK, when a container is deleted, the volume associated with that container is not automatically deleted, unless you do a docker rm -v.

root@shishir-All-Series:~# docker rm --help

Usage:  docker rm [OPTIONS] CONTAINER [CONTAINER...]

Remove one or more containers

Options:
  -f, --force     Force the removal of a running container (uses SIGKILL)
      --help      Print usage
  -l, --link      Remove the specified link
  -v, --volumes   Remove the volumes associated with the container
root@shishir-All-Series:~# 
maartenl945 commented 6 years ago

Thanks @shishir-a412ed

Related to the cleaning up of volumes: Rancher automatically cleans up volumes when a stack is deleted. We also see the related directories under /var/lib/docker-lvm-plugin disappearing when the volumes are cleaned up. However the volume administration in the json files in that directory is not correctly updated. It still lists some of the cleaned up volumes in there. Even with a usage count (if that is what is it) of 1.

shishir-a412ed commented 6 years ago

@maartenl945 However the volume administration in the json files in that directory is not correctly updated. It still lists some of the cleaned up volumes in there. Even with a usage count (if that is what is it) of 1.

This sounds like a different bug. Ideally this should not happen, and the counts should be updated correctly in the config file, when the container (and it's associated volumes) are removed.

Let me first fix @echa original issue. Once that is fixed, we can try to see if your issue is still happening.

Shishir

maartenl945 commented 6 years ago

Yes I agree, it sounds like a different issue.

maartenl945 commented 6 years ago

As a temporary workaround we are now mounting the lvm volumes ourselves during startup of the server. For that to work properly, we have to wait until the LVM volumes exist and then do the mount of the directories in /var/lib/docker-lvm-plugin.

The method to reset the counts in lvmCountConfig.json did not seem to work reliably when the container stack was brought down before system restart.

Unsure how our temporary workaround affects the docker-lvm-plugin and its admin though.

maartenl945 commented 6 years ago

Hi @shishir-a412ed, do you have any idea when you might have time to take a look at this ? It seems like you cannot really use this plugin if volumes are not correctly mounted after a reboot. Regards, Maarten

shishir-a412ed commented 6 years ago

@maartenl945 I will try to take a look at this issue, this weekend. Busy with some other things, sorry for the delay.

maartenl945 commented 6 years ago

@shishir-a412ed No need to apologize, just wanted to see when you might have a look at this. Regards, Maarten

echa commented 6 years ago

Here's a quick workaround to reset all counters on module restart. It reads the currently configured list of volumes from /var/lib/docker-lvm-plugin/lvmVolumesConfig.json and overwrites /var/lib/docker-lvm-plugin/lvmCountConfig.json with all volume counters set to zero.

You need to have jq installed and add a new systemd file under /etc/systemd/system/docker-lvm-plugin.service.d/reset-counters.conf

Remember that this is not a complete solution because neither this little hack nor docker-lvm-plugin check the actual mount status of your filesystems. Should an LVM volume already be mounted AND the counter is zero, docker-lvm-plugin will blindly trust the counter and try mounting again, which results in a mount error and the container will not be started.

This workaround works, however after a server restart.

[Service]
ExecStartPre=/bin/sh -c 'cat /var/lib/docker-lvm-plugin/lvmVolumesConfig.json | jq -c "map_values(.=0)" > /var/lib/docker-lvm-plugin/lvmCountConfig.json'
maartenl945 commented 6 years ago

Thanks for the (workaround) solution @echa !

shishir-a412ed commented 6 years ago

@maartenl945 I checked this issue, and you are right, lvmCountConfig.json still show volume count as 1 after the reboot. The correct value should be 0.

The issue is: Let's take an example scenario:

1) You have a volume named foobar 2) You have 3 running containers, c1, c2 and c3. Each of them has foobar mounted at /run. 3) You reboot the system. The containers are going to exit, and will call the plugin to unmount the volume. 4) The plugin is expected to only umount for the last container. For the rest, it will just do a count--, and update/save to disk lvmCountConfig.json. The reasoning behind this is the LVM device is only mounted for the first container to /var/lib/docker-lvm-plugin/foobar. For the rest of the containers it's just a bind mount of this location. So umount only needs to happen for the last container.

5) Issue: During reboot, when plugin tries to umount on the last container, it fails because systemd already unmounted the device as part of reboot.

Jun 14 19:03:42 localhost.localdomain docker-lvm-plugin[1125]: Unmount: unmount error: exit status 32 output umount: /var/lib/docker-lvm-plugin/foobar: not mounted

So it never gets to update the count to 0.

I tried to fix this here: https://github.com/projectatomic/docker-lvm-plugin/compare/master...shishir-a412ed:auto_mount_issue

By making docker-lvm-plugin to only umount if the device is already mounted. If systemd has unmounted it, we just update the count to 0.

But apparently, even the mount command (sh -c "mount|grep /var/lib/docker-lvm-plugin/foobar") to check if the device is still mounted OR already unmounted, is failing during reboot. Some race condition with PID-1 probably.

Jun 24 15:54:07 localhost.localdomain docker-lvm-plugin[30139]: Unmount: Error checking if volume /var/lib/docker-lvm-plugin/foobar is mounted

I guess the best solution for now, would be to just use @echa 's workaround. Basically after a restart, the counts in lvmCountConfig.json needs to be set to 0.

maartenl945 commented 6 years ago

@shishir-a412ed Thanks for taking a look. Can the plug-in service not just set the counts to 0 on service startup ? Effectively implementing the workaround in the plug-in.

shishir-a412ed commented 6 years ago

@maartenl945 Can the plug-in service not just set the counts to 0 on service startup ? It can, but that won't fix the issue.

The counts are indicative of, how many containers {c1, c2 and c3} this volume {foobar} is mounted to. e.g. in the above scenario. foobar will have a count 3. If the docker daemon is still running those containers, and we just restart the plugin and reset the count to 0, that would be wrong.

shishir-a412ed commented 6 years ago

@maartenl945 There was a small issue in the PR. I have fixed it now, and it's working for me. After the reboot, the counts are reset to 0.

https://github.com/projectatomic/docker-lvm-plugin/pull/53 Can you try it once, and let me know if it works for you too ?

maartenl945 commented 6 years ago

@shishir-a412ed Unfortunately I don’t have a ‘go’ development environment, nor easy access to our target at the moment since I’m not at work for the next couple of weeks. I’ll see what I can do but don’t wait for me since I’m sure other people will be helped by this solution too! Thks, Maarten

shishir-a412ed commented 6 years ago

@maartenl945 No worries. We have merged the PR to master.

Shishir