ClusterHQ / flocker

Container data volume manager for your Dockerized application
https://clusterhq.com
Apache License 2.0
3.38k stars 288 forks source link

Acceptance test fail with [lsblk: /dev/mapper/mpatheh: not a block device] #2970

Closed shay-berman closed 7 years ago

shay-berman commented 7 years ago

The following acceptance test failed during testing the IBM Flocker driver [flocker.acceptance.endtoend.test_dockerplugin.DockerPluginTests.test_create_sized_volume_with_v2_plugin_api ]

Any advice?

Here is the test error :

Failed flocker.acceptance.endtoend.test_dockerplugin.DockerPluginTests.test_create_sized_volume_with_v2_plugin_api (from (empty)) Stacktrace Traceback (most recent call last): testtools.testresult.real._StringException: Traceback (most recent call last): testtools.testresult.real._StringException: Empty attachments:  twisted-eliot-log

twisted-log: {{{ 2016-12-01 18:30:29+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c4908> 2016-12-01 18:30:29+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c4ef0> 2016-12-01 18:30:29+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c4908> 2016-12-01 18:30:29+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c7d40> 2016-12-01 18:30:29+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c4ef0> 2016-12-01 18:30:30+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c7e18> 2016-12-01 18:30:30+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c7d40> 2016-12-01 18:30:30+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064a8050> 2016-12-01 18:30:30+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064c7e18> 2016-12-01 18:30:30+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb305f0e878> 2016-12-01 18:30:30+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb3064a8050> 2016-12-01 18:30:30+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb30605cb00> 2016-12-01 18:30:30+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb305f0e878> 2016-12-01 18:30:30+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb30605cb00> 2016-12-01 18:31:32+0200 [-] Starting factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb30646fd40> 2016-12-01 18:31:33+0200 [-] Stopping factory <twisted.web.client._HTTP11ClientFactory instance at 0x7fb30646fd40> 2016-12-01 18:31:33+0200 [-] Main loop terminated. }}}

Traceback (most recent call last):  File "/root/.virtualenvs/flocker/lib/python2.7/site-packages/twisted/internet/defer.py", line 649, in _runCallbacks    current.result = callback(current.result, *args, **kw)  File "/root/.virtualenvs/flocker/lib/python2.7/site-packages/testtools/testcase.py", line 411, in assertEqual    self.assertThat(observed, matcher, message)  File "/root/.virtualenvs/flocker/lib/python2.7/site-packages/testtools/testcase.py", line 498, in assertThat    raise mismatch_error testtools.matchers._impl.MismatchError: !=: reference = "<class 'main.CalledProcessErrorWithOutput'>: Command '['/bin/lsblk', '--noheadings', '--bytes', '--output', 'SIZE', '/dev/mapper/mpatheh']' returned non-zero exit status 1 and output 'lsblk: /dev/mapper/mpatheh: not a block device\n'" actual    = '97710505984'

More details from a persistent container


> [root@shay-ibm-flocker-node1 ~]# multipath -ll
> mpathkh (36001738cfc9035e80000000000014484) dm-10 IBM     ,2810XIV         
> size=75G features='1 queue_if_no_path' hwhandler='0' wp=rw
> `-+- policy='service-time 0' prio=1 status=active
>   |- 3:0:0:4 sde 8:64  active ready  running
>   |- 4:0:0:4 sdm 8:192 active ready  running
>   `- 5:0:0:4 sdc 8:32  active ready  running
> [root@shay-ibm-node1 flocker-driver]# docker run -it --volume-driver flocker -v a3:/a  ubuntu bash
> 
> root@b0305ac92bc6:/# findmnt -n -m /a -o SOURCE
> /dev/mapper/mpathkh
> 
> root@b0305ac92bc6:/# lsblk /dev/mapper/mpathkh
> lsblk: /dev/mapper/mpathkh: not a block device
> 
> root@b0305ac92bc6:/# ls -l /dev/mapper/mpathkh
> ls: cannot access '/dev/mapper/mpathkh': No such file or directory
> 
> root@b0305ac92bc6:/# ls -l /dev/mapper        
> ls: cannot access '/dev/mapper': No such file or directory
> 
> root@b0305ac92bc6:/# ls -l /dev/dm-10 
> ls: cannot access '/dev/dm-10': No such file or directory
> 
> root@b0305ac92bc6:/# ls /dev/
> console  core  fd  full  fuse  mqueue  null  ptmx  pts  random  shm  stderr  stdin  stdout  tty  urandom  zero
> 
> root@b0305ac92bc6:/# df
> Filesystem                                                                                         1K-blocks    Used Available Use% Mounted on
> /dev/mapper/docker-253:3-92386872-2d718073c889f6c4ae7801be5f2963e812332c087a3a844eba3e204a3af9baf7  10474496  157948  10316548   2% /
> tmpfs                                                                                                 942184       0    942184   0% /dev
> tmpfs                                                                                                 942184       0    942184   0% /sys/fs/cgroup
> /dev/mapper/mpathkh                                                                                 77277956   53256  73276156   1% /a
> /dev/mapper/rhel_temp--host-root                                                                    27457408 7619152  19838256  28% /etc/hosts
> shm                                                                                                    65536       0     65536   0% /dev/shm

BTW if i run the lsblk outside of the container it works fine, More details from the flocker node it self:

[root@shay-ibm-node1 ~]# lsblk /dev/mapper/mpathkh NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT mpathkh 253:10 0 75G 0 mpath /flocker/df7902e3-4ce3-44b7-9549-629c80f6f343

[root@shay-ibm-node1 ~]# ls -l /dev/mapper/mpathkh lrwxrwxrwx. 1 root root 8 Dec 6 22:47 /dev/mapper/mpathkh -> ../dm-10

[root@shay-ibm-node1 ~]# ls -l /dev/mapper
total 0 crw-------. 1 root root 10, 236 Nov 10 14:17 control lrwxrwxrwx. 1 root root 7 Dec 6 22:21 docker-253:3-92386872-2d718073c889f6c4ae7801be5f2963e812332c087a3a844eba3e204a3af9baf7 -> ../dm-6 lrwxrwxrwx. 1 root root 7 Nov 10 14:17 docker-253:3-92386872-pool -> ../dm-7 lrwxrwxrwx. 1 root root 7 Dec 6 22:43 mpathes -> ../dm-8 lrwxrwxrwx. 1 root root 7 Dec 6 22:47 mpathkf -> ../dm-9 lrwxrwxrwx. 1 root root 8 Dec 6 22:47 mpathkg -> ../dm-11 lrwxrwxrwx. 1 root root 8 Dec 6 22:47 mpathkh -> ../dm-10 lrwxrwxrwx. 1 root root 7 Nov 10 14:17 rhel_temp--host-pool00 -> ../dm-5 lrwxrwxrwx. 1 root root 7 Nov 10 14:17 rhel_temp--host-pool00_tdata -> ../dm-1 lrwxrwxrwx. 1 root root 7 Nov 10 14:17 rhel_temp--host-pool00_tmeta -> ../dm-0 lrwxrwxrwx. 1 root root 7 Nov 10 14:17 rhel_temp--host-pool00-tpool -> ../dm-2 lrwxrwxrwx. 1 root root 7 Nov 10 14:17 rhel_temp--host-root -> ../dm-3 lrwxrwxrwx. 1 root root 7 Nov 10 14:17 rhel_temp--host-swap -> ../dm-4

[root@shay-ibm-node1 ~]# ls -l /dev/dm-10 brw-rw----. 1 root disk 253, 10 Dec 6 22:47 /dev/dm-10

wallrj commented 7 years ago

Hey @shay-berman,

Sorry for delayed reply.

It could be that the version of lsblk running inside the container is older / different than the version that you're running on the host. The test launches a python:2.7-slim image and runs inside it a Python HTTP server that looks up the device for the supplied mount point and then runs lsblk on that device to get the reported size of the device and returns the size when an HTTP client connects to it.

It sounds complicated, but it's done that way to give a better end-to-end coverage and to avoid having to SSH into the acceptance node.

Some ideas:

shay-berman commented 7 years ago

Hi @wallnerryan

lsblk command fails with "lsblk: /dev/mapper/mpathah: not a block device" because the multipath device does not exist inside the container (actually the whole dir /dev/mapper doesn't exist inside the container). And its not related to the lsblk version (BTW the lsblk version inside the container is higher that our side of the container)

Here are my follow up to your requests from the previous replay :

  1. You asked me to test it with image python:2.7-slim. So the problem remains also in python:2.7-slim. here are some outputs of create 1GB vol and run container with it :

    [root@docker-c1-n1 ~]# docker volume create --driver=flocker --name v1g --opt profile=silver --opt size=1g

    [root@docker-c1-n1 ~]# flockerctl ls DATASET SIZE METADATA STATUS SERVER
    7ce699ae-ec46-45b8-b61c-72ec3e34e607 1.00G maximum_size=1073741824,name=v1g,clusterhq: attached ✅ aa87d203 (9.151.161.241) flocker:profile=silver

    [root@docker-c1-n1 ~]# docker volume ls DRIVER VOLUME NAME flocker v1g

    [root@docker-c1-n1 ~]# df /flocker/7ce699ae-ec46-45b8-b61c-72ec3e34e607 Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/mpathah 999320 2548 927960 1% /flocker/7ce699ae-ec46-45b8-b61c-72ec3e34e607

    [root@docker-c1-n1 ~]# docker run --volume-driver flocker -v v1g:/data -it python:2.7-slim bash root@c6f0c3b20eb0:/# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/docker-253:3-42041356-416d32f2f4d298c054494bf8a1dc3a134897c1aa355a8cefb8ee99b854bb1959 10474496 231480 10243016 3% / tmpfs 43193200 0 43193200 0% /dev tmpfs 43193200 0 43193200 0% /sys/fs/cgroup /dev/mapper/mpathah 999320 2548 927960 1% /data /dev/mapper/rhel_temp--host-root 27457408 3404592 24052816 13% /etc/hosts shm 65536 0 65536 0% /dev/shm root@c6f0c3b20eb0:/# findmnt -n -m /data -o SOURCE /dev/mapper/mpathah

    root@c6f0c3b20eb0:/# lsblk /dev/mapper/mpathah lsblk: /dev/mapper/mpathah: not a block device

    root@c6f0c3b20eb0:/# ls -l /dev/mapper/mpathah ls: cannot access /dev/mapper/mpathah: No such file or directory

    root@c6f0c3b20eb0:/# ls -l /dev/dm-7 ls: cannot access /dev/dm-7: No such file or directory

  2. You asked me to use cat /sys/block/mpathkh/size instead of lsblk. So since /sys/block/mpathkh does not exist inside the container(BTW neither on the host it self) I cannot get its size. but I can get the size by using its internal device dm-x device(/sys/block/dm-7) which does exist inside the container. (BTW the data inside /sys/block/[device]/size is in 512-byte BLOCKs and not in Bytes) but unfortunately i don't know how to identify inside the container that /dev/mapper/mpathah correlated to it dm-x device(/sys/block/dm-7). I can see two possible options (@wallrj, any other ideas? ): a. to run your container with -v /dev:/dev, then the lsblk /dev/mapper/mpathah will work. b. to change our driver.get_device_path() so instead of returning /dev/mpath/[device], it will return the /dev/dm-[x] device. BUT according to this the /dev/dm-[x] should not be used. here are some outputs about getting the size of the device :

    [root@docker-c1-n1 sys]# multipath -l mpathah (36001738cfc9035e80000000000014561) dm-7 IBM ,2810XIV
    size=1.0G features='1 queue_if_no_path' hwhandler='0' wp=rw -+- policy='service-time 0' prio=0 status=active |- 3:0:0:2 sdf 8:80 active undef running |- 4:0:0:2 sde 8:64 active undef running - 5:0:0:2 sdg 8:96 active undef running

    [root@docker-c1-n1 sys]# docker exec -it c6f0c3b20eb0 bash root@c6f0c3b20eb0:/#

    root@c6f0c3b20eb0:~# df /data Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/mpathah 999320 2548 927960 1% /data

    root@c6f0c3b20eb0:~# cat /sys/block/mpathkh/size cat: /sys/block/mpathkh/size: No such file or directory

    root@c6f0c3b20eb0:~# ls -l /sys/block/mpathkh ls: cannot access /sys/block/mpathkh: No such file or directory

    root@c6f0c3b20eb0:~# ls -l /sys/block total 0 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-0 -> ../devices/virtual/block/dm-0 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-1 -> ../devices/virtual/block/dm-1 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-2 -> ../devices/virtual/block/dm-2 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-3 -> ../devices/virtual/block/dm-3 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-4 -> ../devices/virtual/block/dm-4 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-5 -> ../devices/virtual/block/dm-5 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-6 -> ../devices/virtual/block/dm-6 lrwxrwxrwx. 1 root root 0 Dec 11 15:14 dm-7 -> ../devices/virtual/block/dm-7 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 dm-8 -> ../devices/virtual/block/dm-8 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 fd0 -> ../devices/platform/floppy.0/block/fd0 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 loop0 -> ../devices/virtual/block/loop0 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 loop1 -> ../devices/virtual/block/loop1 lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sda -> ../devices/pci0000:00/0000:00:15.0/0000:03:00.0/host0/target0:0:0/0:0:0:0/block/sda lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sdb -> ../devices/platform/host3/session1/target3:0:0/3:0:0:1/block/sdb lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sdc -> ../devices/platform/host4/session2/target4:0:0/4:0:0:1/block/sdc lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sdd -> ../devices/platform/host5/session3/target5:0:0/5:0:0:1/block/sdd lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sde -> ../devices/platform/host4/session2/target4:0:0/4:0:0:2/block/sde lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sdf -> ../devices/platform/host3/session1/target3:0:0/3:0:0:2/block/sdf lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sdg -> ../devices/platform/host5/session3/target5:0:0/5:0:0:2/block/sdg lrwxrwxrwx. 1 root root 0 Dec 11 19:25 sr0 -> ../devices/pci0000:00/0000:00:07.1/ata2/host2/target2:0:0/2:0:0:0/block/sr0

    root@c6f0c3b20eb0:~# cat /sys/block/dm-7/size 2097152

    root@c6f0c3b20eb0:~# exit

    [root@docker-c1-n1 ~]# docker run --volume-driver flocker -v v1g:/data -v /dev:/dev -it python:2.7-slim bash

    root@04f2d4b1e6a9:/# ls /dev/mapper/mpathah /dev/mapper/mpathah

    root@04f2d4b1e6a9:/# lsblk /dev/mapper/mpathah NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT mpathah 253:7 0 1G 0 mpath /data

  3. You asked to "modify that test to use cat /sys/block/[device]/size instead of lsblk" but as I mentioned inside the container I do not there is no access to /sys/block/[mapper device]/size inside the container.

  4. lsblk version number inside the container is actualy higher then host it self :

    root@c6f0c3b20eb0:/# lsblk --version lsblk from util-linux 2.25.2

    [root@docker-c1-n1 ~]# lsblk --version lsblk from util-linux 2.23.2

  5. You asked me to test it with ubuntu:16.04 image, but the problem remains even in this new image.

In addition, I wonder how other vendors pass this specific test [test_create_sized_volume_with_v2_plugin_api], because also netapp flocker driver and xio flocker driver return /dev/mpath/[device] from the get_device_path().

@wallrj any suggestions?

wallrj commented 7 years ago

In https://github.com/ClusterHQ/flocker/issues/2970#issuecomment-266301763 @shay-berman wrote:

Thanks for digging into it Shay. My answers are below:

I can see two possible options (@wallrj, any other ideas? ): a. to run your container with -v /dev:/dev, then the lsblk /dev/mapper/mpathah will work.

Yeah, that seems like a good idea.

b. to change our driver.get_device_path() so instead of returning /dev/mpath/[device], it will return the /dev/dm-[x] device. BUT according to this the /dev/dm-[x] should not be used.

I agree, that we should not examine the dm-x device.

The only other option I can think of, is to run the container in privileged mode.

[~]$ docker run --rm python:2.7-slim ls -l /dev/mapper
ls: cannot access /dev/mapper: No such file or directory
[~]$ docker run --rm --privileged python:2.7-slim ls -l /dev/mapper
total 0
crw------- 1 root root 10, 236 Dec 12 23:03 control

I wonder how other vendors pass this specific test [test_create_sized_volume_with_v2_plugin_api], because also netapp flocker driver and xio flocker driver return /dev/mpath/[device] from the get_device_path().

I don't know. I suppose they may have skipped that test.

shay-berman commented 7 years ago

What is the easier way to apply my suggestion, to run the container with /dev:/dev ?