cloudviz / agentless-system-crawler

A tool to crawl systems like crawlers for the web
Apache License 2.0
117 stars 44 forks source link

crawler cannot crawl containers created after crawler with aufs storage backend #167

Open sastryduri opened 8 years ago

sastryduri commented 8 years ago

Description

Crawler, when deployed as a docker container, cannot crawl containers created after crawler container if the storage back end is aufs. This is not a crawler problem as such. We think the problem lies with aufs driver. I am not sure.

How to Reproduce

btrfs storage backend -- docker version: 1.10.2 (ubuntu 14.04)

docker run --privileged --pid=host --net=host -v /:/hostroot:ro -it ubuntu:latest /bin/bash
[consider the docker container just created it as crawler container.]
[in crawler]
cd /hostroot/var/lib/docker/btrfs/subvolumes
ls -lrt 
[the last directory points to rootfs of the crawler container.]

[in host, create another container. This represents crawled container.]
docker run -it ubuntu bash

[in the crawler container]
ls -lrt
[last directory, represents the root of crawled container, if do ls you will see directory contents]

ls 3e812e171d964b3ea81cfd92b85afcf459286e5e0dc07e71beba4223bef33649
bin   dev  home  lib64  mnt  proc  run   srv  tmp  var
boot  etc  lib   media  opt  root  sbin  sys  usr

aufs storage backend docker: 1.12.0 (ubuntu 14.04)

docker run --privileged --pid=host --net=host -v /:/hostroot:ro -it ubuntu:latest /bin/bash
[consider the docker container just created it as crawler container.]
[in crawler]
cd /hostroot/var/lib/docker/aufs/mnt
ls -lrt
[last directory shows the root of crawler container.]

[in host]
docker run -it ubuntu bash

[in crawler container]
ls -lrt
ls last-dir (shows nothing)

[copy the directory of the crawler container, and exit the crawler container.]
[while keeping the crawled container active, create another instance of crawler container.]
docker run --privileged --pid=host --net=host -v /:/hostroot:ro -it ubuntu:latest /bin/bash
[consider the docker container just created it as crawler container.]
[in crawler]
cd /hostroot/var/lib/docker/aufs/mnt
ls last-dir (now shows contents of root directory)

With aufs storage backend we observed the same behavior with docker 1.10.2 as well.

sastryduri commented 8 years ago

At this point we know this. crawler, in container mode, works with btrfs storage backend on docker 1.10.x to 1.12.x.

crawler, in container mode, does not with aufs as storage backend on either docker 1.10.x, and 1.12.x.

We also upgraded linux kernel version to 4.4.0-38 (ubuntu 16.04). The problems with aufs still persist.

sastryduri commented 8 years ago

I tested with ubuntu 16.06 linux kernel 4.4.0-38-generic, docker 1.12, and mounting /var/lib/docker on a separate file system, and making the mount point shared:

 mount --make-shared /var/lib/docker

[and creating crawler container as follows:]
docker run --privileged --pid=host --net=host -v /var/lib/docker:/hostroot:shared -it ubuntu:latest /bin/bash

[contents of root_fs of a container created later can't be accessed.]
cmluciano commented 8 years ago

Can you please try with the overlayfs2 backend?

sastryduri commented 8 years ago

@cmluciano looking into it

sastryduri commented 8 years ago

@cmluciano @canturkisci @ricarkol

overlay storage backend allows access to other containers files

However, crawler does not support overlay storage backend.

experiments with overlay network

  1. I used ubuntu 16.04.
uname -a
Linux over.sl.cloud9.ibm.com 4.4.0-42-gen

In this system, need to explicitly insert overlay module using modprobe overlay command.

docker info
Containers: 4
 Running: 0
 Paused: 0
 Stopped: 4
Images: 1
Server Version: 1.12.2
Storage Driver: overlay
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: host bridge null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-42-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.828 GiB
Name: over.sl.cloud9.ibm.com
ID: DSK3:DWFM:WEJI:FTBD:QHKK:X3LJ:TZAE:GMHW:XJHI:YWMW:2E3S:IJ3P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

To make docker use overlay backend, I needed to edit /etc/systemd/system/multi-user.target.wants/docker.service and change the line from ExecStart=/usr/bin/dockerd -H fd:// to ExecStart=/usr/bin/dockerd -H fd:// --storage-driver=overlay

After modifying the above file execute systemctl daemon-reload

Then execute systemctl restart docker.service to restart the docker daemon.

While the file /etc/default/docker exists its changes had no impact on how docker started.

start a crawler-like container:

docker run --privileged --pid=host --net=host -v /:/hostroot:ro,shared -it ubuntu:16.04 /bin/bash
[this is crawler container]

/hostroot/var/lib/docker/overlay# ls -lrt
total 28
drwx------ 13 13:06 56882321aad9383466bccfdefa7eea56116c0ee8e9b78aec4bd17b5fb0300221
drwx------ 13 13:07 f43663f340c45bb13b54722ff5c3d8589736261d9ed609568a275e864019db87
drwx------ 13 13:07 2857fb0ebf6c279a7e4180fe76eb63dbe2e9c83c3bebf7366f07bc2ac3a14660
drwx------ 13 13:07 182a2d34afae969dc257ffbc0b632b1d6031a03caca8eac65ae57c70247de674
drwx------ 13 13:07 e0e44fc154ecc517b8e936d02826f079668c7619990f51694ee5d2acd713cac5
drwx------ 13 14:24 25c0c62516b16ea870641d586ec24307bb9f97d271de6863fcf365be5cf7b2f0-init
drwx------ 13 14:24 25c0c62516b16ea870641d586ec24307bb9f97d271de6863fcf365be5cf7b2f0

[each line of command output is edited to fit in one line, and only date and time is kept.] 

start another container

docker run -it ubuntu bash [ this is crawled container]
[in crawler container]
ls -lrt
ls -lrt
total 36
drwx------ 13 13:06 56882321aad9383466bccfdefa7eea56116c0ee8e9b78aec4bd17b5fb0300221
drwx------ 13 13:07 f43663f340c45bb13b54722ff5c3d8589736261d9ed609568a275e864019db87
drwx------ 13 13:07 2857fb0ebf6c279a7e4180fe76eb63dbe2e9c83c3bebf7366f07bc2ac3a14660
drwx------ 13 13:07 182a2d34afae969dc257ffbc0b632b1d6031a03caca8eac65ae57c70247de674
drwx------ 13 13:07 e0e44fc154ecc517b8e936d02826f079668c7619990f51694ee5d2acd713cac5
drwx------ 13 14:24 25c0c62516b16ea870641d586ec24307bb9f97d271de6863fcf365be5cf7b2f0-init
drwx------ 13 14:24 25c0c62516b16ea870641d586ec24307bb9f97d271de6863fcf365be5cf7b2f0
drwx------ 13 14:33 54e6da48a1df26cee4fa50e155abfe561348460a808b87fa384d1c140ac38a70-init
drwx------ 13 14:33 54e6da48a1df26cee4fa50e155abfe561348460a808b87fa384d1c140ac38a70

ls 54e6da48a1df26cee4fa50e155abfe561348460a808b87fa384d1c140ac38a70
lower-id  merged  upper  work

ls  54e6da48a1df26cee4fa50e155abfe561348460a808b87fa384d1c140ac38a70/upper
dev  etc

[ create a new file in crawled container]
touch iam-new-in-crawled-container

[in crawler container]
/hostroot/var/lib/docker/overlay# ls  54e6da48a1df26cee4fa50e155abfe561348460a808b87fa384d1c140ac38a70/upper
dev  etc  iam-new-in-crawled-container
cmluciano commented 8 years ago

Just checking, are you saying that it did or did not work? And also did you try with overlay v1 or overlay v2? We would consider switching to overlay v2 but not v1.

sastryduri commented 8 years ago

@cmluciano

based on the tests I did above I think crawler container could access files of other containers if it is run in privileged mode.

lsmod | grep over
overlay                49152  1

I think this is overlay not overlay2

I think it is

cmluciano commented 8 years ago

Can you please test with overlay2 also. Overlay1 is good but it has known issues with causing inode locks.

sastryduri commented 8 years ago

@cmluciano @canturkisci @ricarkol

test with overlay2: works in overlay2

root@over:/etc# docker info
Containers: 0
 Running: 0
 Paused: 0
 Stopped: 0
Images: 0
Server Version: 1.12.2
Storage Driver: overlay2
 Backing Filesystem: extfs
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge null host overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Security Options: apparmor seccomp
Kernel Version: 4.4.0-42-generic
Operating System: Ubuntu 16.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.828 GiB
Name: over.sl.cloud9.ibm.com
ID: DSK3:DWFM:WEJI:FTBD:QHKK:X3LJ:TZAE:GMHW:XJHI:YWMW:2E3S:IJ3P
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
WARNING: No swap limit support
Insecure Registries:
 127.0.0.0/8

start crawler container

docker run --privileged --pid=host --net=host -v /:/hostroot:ro -it ubuntu:16.04 /bin/bash
[this is crawler container]
/hostroot/var/lib/docker/overlay2# ls -lrt
total 32
drwx------ 13 15:42 b61a01945069c602e6fc64578d88137f9062f76bc321f633880dd2f0ad809f8d
drwx------ 13 15:42 b70f12afb982367b4894d4703effd4b87e5442c03b8dc1716d1e44791733c3ca
drwx------ 3 15:42 7896958940c33dc3a5b0cbdcf85391d6f824089a9bd321721e1320e6c7d107f1
drwx------ 13 15:42 0dff9ad8a3b6d882c13026aadba87bb5fb4ef96cc8c144f15cda01993356ab35
drwx------ 13 15:42 b12e588b0327952bb5208c127a24411635c3053507300018cb23a560e626eb9a
drwx------ 13 15:42 c569863c2d859d33855ea2ea61aff412fe526ba04846037add9708e09b921f6f-init
drwx------ 13 15:42 l
drwx------ 13 15:42 c569863c2d859d33855ea2ea61aff412fe526ba04846037add9708e09b921f6f

start crawled container

[in the crawled container]
/hostroot/var/lib/docker/overlay2# ls -lrt
total 48
drwx------ 13 15:42 b61a01945069c602e6fc64578d88137f9062f76bc321f633880dd2f0ad809f8d
drwx------ 13 15:42 b70f12afb982367b4894d4703effd4b87e5442c03b8dc1716d1e44791733c3ca
drwx------ 13 15:42 7896958940c33dc3a5b0cbdcf85391d6f824089a9bd321721e1320e6c7d107f1
drwx------ 13 15:42 0dff9ad8a3b6d882c13026aadba87bb5fb4ef96cc8c144f15cda01993356ab35
drwx------ 13 15:42 b12e588b0327952bb5208c127a24411635c3053507300018cb23a560e626eb9a
drwx------ 13 15:42 c569863c2d859d33855ea2ea61aff412fe526ba04846037add9708e09b921f6f-init
drwx------ 13 15:42 c569863c2d859d33855ea2ea61aff412fe526ba04846037add9708e09b921f6f
drwx------ 13 15:46 58f5a4e1a6924d72edf3c7a534078e5422fdc8e31474823be41ec3b469a49346-init
drwx------ 13 15:46 58f5a4e1a6924d72edf3c7a534078e5422fdc8e31474823be41ec3b469a49346
drwx------ 13 15:46 e8bc29eeea0f1e90a36580a753069e0494389ec294e0a84001a0c4997bdec10a-init
drwx------ 13 15:46 l
drwx------ 13 15:46 e8bc29eeea0f1e90a36580a753069e0494389ec294e0a84001a0c4997bdec10a

the new container layer is e8bc29eeea0f1e90a36580a753069e0494389ec294e0a84001a0c4997bdec10a

[in the crawler container]
ls e8bc29eeea0f1e90a36580a753069e0494389ec294e0a84001a0c4997bdec10a/diff/

create a new file on the crawled container

[in the crawled container]
touch i-am-a-new-file

now do ls of the container layer

[in the crawler container]
ls e8bc29eeea0f1e90a36580a753069e0494389ec294e0a84001a0c4997bdec10a/diff/
i-am-a-new-file

[in the crawler container] cat e8bc29eeea0f1e90a36580a753069e0494389ec294e0a84001a0c4997bdec10a/diff/i-am-a-new-file Thu Oct 13 15:53:55 UTC 2016

cmluciano commented 8 years ago

So everything looks good with overlayfs version 2?

sastryduri commented 8 years ago

@cmluciano @ricarkol @canturkisci

crawler currently does not support overlay or overlay2

cmluciano commented 8 years ago

You just mentioned above that overlay seemed to work.

ricarkol commented 8 years ago

@sastryduri i'm confused. There are two issues i think:

  1. we might not support the overlay storage driver in general. Specifically, the dockerutils module can't handle that file system.
  2. overlay storage has the issue you mentioned in this issue: new containers won't show up in the mounted /var/lib/docker in the crawler container.

are you talking about 1 or 2?

sastryduri commented 8 years ago

@ricarkol Documented experiments that show new containers show up in the mounted /var/lib/docker in the crawler container. This is true for both overlay and overlay2.

We need to think about what storage systems we support.

ricarkol commented 8 years ago

Thanks @sastryduri so:

  1. Nope, just confimed in the code: SUPPORTED_DRIVERS = ['btrfs', 'devicemapper', 'aufs', 'vfs']
  2. and yes

We need to write the code to handle overlayfs, do you want to give it a try?

cmluciano commented 8 years ago

The only storage system that we use in production is AUFS. We cannot support BTRFS. If we can perform upgrades to our 14.04 kernel, we can potentially support overlayfs2.

sastryduri commented 8 years ago

@cmluciano @ricarkol I checked that upgrades 14.04 kernel would work. Basically, in a plain ubuntu 14.04 image I ran apt-get update, and apt-get upgrade, and I got kernel 4.4.0-36-generic

cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.5 LTS"
cmluciano commented 8 years ago

@sastryduri Is this server in our csf dev account? I was not able to run those commands and get that kernel version.

sastryduri commented 8 years ago

@cmluciano no. This is in iris, basically softlayer;

cmluciano commented 7 years ago

Trying to recall, does crawler work on anything except AUFS? Does it work on devicemapper?

sastryduri commented 7 years ago

@cmluciano devicemapper is supported

canturkisci commented 7 years ago

@nadgowdas we should talk w Alaa and team on this for fr8r.