cloudviz / agentless-system-crawler

A tool to crawl systems like crawlers for the web
Apache License 2.0
117 stars 44 forks source link

The crawler fails to collect package info and produces broken frames #338

Open niltonb opened 7 years ago

niltonb commented 7 years ago

Log Output

2017-10-13 05:56:15,483 MainProcess ERROR    Error crawling packages
Traceback (most recent call last):
  File "/crawler/utils/package_utils.py", line 162, in crawl_packages
    root_dir, dbpath, installed_since):
  File "/crawler/utils/package_utils.py", line 35, in get_dpkg_packages
    shell=False)
  File "/crawler/utils/misc.py", line 65, in subprocess_run
    (cmd, rc, err))
RuntimeError: (['dpkg-query', '-W', '--admindir=/var/lib/docker/overlay2/rootfs/var/lib/dpkg', '-f=${Package}|${Version}|${Architecture}|${Installed-Size}\n']) failed with rc=2: dpkg-query: error: failed to open package info file `/var/lib/docker/overlay2/rootfs/var/lib/dpkg/status' for reading: No such file or directory
tatsuhirochiba commented 7 years ago

I found when this error happens. This failure occurs when the following two cases are satisfied at the same time.

  1. container is not based on ubuntu/debian image (i.e. app container)
  2. run crawler with python crawler/crawler.py --crawlmode OUTCONTAINER ....

test scenario

I prepared four test cases.

Containers are;

Crawler run command examples are;

testcase 1: app container and python crawler/crawler.py ...

root@host:/# cat /tmp/testcase1.log
2017-10-17 12:35:08,223 MainProcess INFO     get_docker_container_rootfs_path: long_id=e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4, deriver=devicemapper, server_version=17.09.0-ce
2017-10-17 12:35:08,232 MainProcess INFO     setup_namespace_and_metadata: long_id=e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4
2017-10-17 12:35:08,802 MainProcess INFO     get_docker_container_rootfs_path: long_id=e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4, deriver=devicemapper, server_version=17.09.0-ce
2017-10-17 12:35:08,890 MainProcess ERROR    Error crawling packages
Traceback (most recent call last):
  File "/crawler/utils/package_utils.py", line 162, in crawl_packages
    root_dir, dbpath, installed_since):
  File "/crawler/utils/package_utils.py", line 35, in get_dpkg_packages
    shell=False)
  File "/crawler/utils/misc.py", line 65, in subprocess_run
    (cmd, rc, err))
RuntimeError: (['dpkg-query', '-W', '--admindir=/var/lib/docker/overlay2/rootfs/var/lib/dpkg', '-f=${Package}|${Version}|${Architecture}|${Installed-Size}\n']) failed with rc=2: dpkg-query: error: failed to open package info file `/var/lib/docker/overlay2/rootfs/var/lib/dpkg/status' for reading: No such file or directory

testcase 2: app container and python crawler.py ...

root@host:/# cat /tmp/testcase2.log
2017-10-17 12:35:37,465 MainProcess INFO     get_docker_container_rootfs_path: long_id=e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4, deriver=devicemapper, server_version=17.09.0-ce
2017-10-17 12:35:37,481 MainProcess INFO     setup_namespace_and_metadata: long_id=e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4
2017-10-17 12:35:37,674 MainProcess INFO     get_docker_container_rootfs_path: long_id=e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4, deriver=devicemapper, server_version=17.09.0-ce

testcase 3: debian based container and python crawler/crawler.py ...

root@host:/# cat /tmp/testcase3.log
2017-10-17 12:50:48,056 MainProcess INFO     get_docker_container_rootfs_path: long_id=eadb333ab04b837e94b6856fd0e5081ba14374440dd47405d39c856d38810c95, deriver=devicemapper, server_version=17.09.0-ce
2017-10-17 12:50:48,066 MainProcess INFO     setup_namespace_and_metadata: long_id=eadb333ab04b837e94b6856fd0e5081ba14374440dd47405d39c856d38810c95

testcase 4: debian based container and python crawler.py ...

root@host:/# cat /tmp/testcase4.log
2017-10-17 12:47:09,104 MainProcess INFO     get_docker_container_rootfs_path: long_id=eadb333ab04b837e94b6856fd0e5081ba14374440dd47405d39c856d38810c95, deriver=devicemapper, server_version=17.09.0-ce
2017-10-17 12:47:09,114 MainProcess INFO     setup_namespace_and_metadata: long_id=eadb333ab04b837e94b6856fd0e5081ba14374440dd47405d39c856d38810c95
2017-10-17 12:48:09,516 MainProcess ERROR    Timed out waiting for process 10053 to exit.

check frames

I compared two frames from test case 1 and 2. The only difference is metadata, so the frames are not broken. (there is no package related feature in both)

root@host:/# diff /tmp/testcase1.e040e9efc602.0 /tmp/testcase2.e040e9efc602.0
1,2c1,2
< metadata  "metadata"  {"container_long_id":"e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4","features":"os,package,disk,config,file","emit_shortname":"e040e9efc602","timestamp":"2017-10-17T12:35:08+0000","docker_image_short_name":"heapster:v1.4.0","namespace":"kube-system/heapster-1395572904-8llls/eventer/e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4","docker_image_registry":"registry.ng.bluemix.net","owner_namespace":"mdelder","docker_image_tag":"v1.4.0","container_short_id":"e040e9efc602","system_type":"container","container_name":"k8s_eventer_heapster-1395572904-8llls_kube-system_80fff4fc-b212-11e7-a280-069d120f1ec2_0","container_image":"sha256:749531a6d2cf322bd8a35c95c25c6ad722ddeb66260ec8c1e03410cc7bd449aa","docker_image_long_name":"registry.ng.bluemix.net/mdelder/heapster:v1.4.0","uuid":"e1f37bd4-cd71-4760-af71-68040fee6a67"}
< os    "linux" {"boottime":1505875457.0,"uptime":2368251.0,"ipaddr":["127.0.0.1","10.184.146.27"],"os":"unknown","os_version":"unknown","os_kernel":"unknown","architecture":"x86_64"}
---
> metadata  "metadata"  {"container_long_id":"e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4","features":"os,package,disk,config,file","emit_shortname":"e040e9efc602","timestamp":"2017-10-17T12:35:37+0000","docker_image_short_name":"heapster:v1.4.0","namespace":"kube-system/heapster-1395572904-8llls/eventer/e040e9efc6027acb6ba919ea647b9a9f52c0a6bd46efc8759d6965225747b7a4","docker_image_registry":"registry.ng.bluemix.net","owner_namespace":"mdelder","docker_image_tag":"v1.4.0","container_short_id":"e040e9efc602","system_type":"container","container_name":"k8s_eventer_heapster-1395572904-8llls_kube-system_80fff4fc-b212-11e7-a280-069d120f1ec2_0","container_image":"sha256:749531a6d2cf322bd8a35c95c25c6ad722ddeb66260ec8c1e03410cc7bd449aa","docker_image_long_name":"registry.ng.bluemix.net/mdelder/heapster:v1.4.0","uuid":"f059c1f7-9edc-4559-a404-f5130dbf3c69"}
> os    "linux" {"boottime":1505875457.0,"uptime":2368280.0,"ipaddr":["127.0.0.1","10.184.146.27"],"os":"unknown","os_version":"unknown","os_kernel":"unknown","architecture":"x86_64"}
sahilsuneja1 commented 7 years ago

@tatsuhirochiba: could you please direct me to the specific images demonstrating this behaviour?