SUSE / machinery

A systems management toolkit for Linux
GNU General Public License v3.0
158 stars 32 forks source link

Filters seem to be ignored during unmanaged-files inspection #2193

Closed wernerflamme closed 7 years ago

wernerflamme commented 7 years ago

machinery inspect -x sapdisk

Inspecting sapdisk for os, packages, patterns, repositories, users, groups, services, changed-config-files, changed-managed-files, unmanaged-files...

Note: There are filters being applied during inspection. (Use --verbose option to show the filters)

Inspecting os... -> Found operating system 'SUSE Linux Enterprise Server 11' version '11 SP4'. Inspecting packages... -> Found 1879 packages. Inspecting patterns... -> Found 24 patterns. Inspecting repositories... -> Found 79 repositories. Inspecting users... -> Found 77 users. Inspecting groups... -> Found 81 groups. Inspecting services... -> Found 175 services. Inspecting changed-config-files... -> Extracted 158 changed configuration files. Inspecting changed-managed-files... -> Extracted 105 changed managed files. Inspecting unmanaged-files... 2016/11/07 15:43:34 open /tmp/.security.xl6yCR/: no such file or directory Machinery experienced an unexpected error. Please file a bug report at: https://github.com/SUSE/machinery/issues/new Execution of "ssh root@sapdisk -o LogLevel\=ERROR LANGUAGE\= LC_ALL\=en_US.utf8 /root/machinery-helper --extract-metadata" failed with status 1 (error output streamed away). `Backtrace:

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:555:in `check_errors'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:364:in `run'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/logged_cheetah.rb:23:in `run'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/remote_system.rb:92:in `run_command'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/machinery_helper.rb:60:in `run_helper'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:85:in `run_helper_inspection'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:59:in `inspect'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:87:in `block in build_description'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `each'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `build_description'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:21:in `inspect_system'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/cli.rb:640:in `block (2 levels) in '

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `call'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `execute'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:296:in `block in call_command'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call_command'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:83:in `run'

/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bin/machinery:41:in `<top (required)>'

/usr/bin/machinery:24:in `load'

/usr/bin/machinery:24:in `

'

wernerflamme commented 7 years ago

I started this again and get nearly the same error again, but this time it is "open /var/spool/cups/c105580: no such file or directory". I guess that temporary files make machinery crash. But how could I possibly inspect a running host without having temporary files?

thardeck commented 7 years ago

Thx for your report. So the issue seems that the files are there during inspection but get removed before extraction.

For now you could either inspect without extraction or filter the directories by using using --skip-files, for example by running machinery inspect -x sapdisk --skip-files=/var/spool/cups.

The only thing that wonders is me how /tmp could have ended up in the list, since we filter it by default. If you want to see the filters which were applied during inspection you can run machinery show --verbose sapdisk

Can you check if /tmp is mentioned in this list, the one under The following filters were applied during inspection:?

wernerflamme commented 7 years ago

Hm, when I enter machinery show --verbose sapdisk, I get several sections, but no filters or directory lists. The sections are titled "Operating System", "Packages", "Patterns", "Repositories", "Users", "Groups", "Services", "Changed Configuration Files", "Changed Managed Files" - plus the headline "unmanaged-files", but this is empty, since machinery always breaks here.

When I enter machinery inspect -x sapdisk --skip-files=/var/spool/,/tmp,/var/tmp --verbose, the output starts with a filter list:

/unmanaged_files/name=/etc/passwd
/unmanaged_files/name=/etc/shadow
/unmanaged_files/name=/etc/group
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/lost+found
/unmanaged_files/name=/var/run
/unmanaged_files/name=/var/lock
/unmanaged_files/name=/var/lib/rpm
/unmanaged_files/name=/.snapshots
/unmanaged_files/name=/proc
/unmanaged_files/name=/etc/init.d/boot.d
/unmanaged_files/name=/etc/init.d/rc0.d
/unmanaged_files/name=/etc/init.d/rc1.d
/unmanaged_files/name=/etc/init.d/rc2.d
/unmanaged_files/name=/etc/init.d/rc3.d
/unmanaged_files/name=/etc/init.d/rc4.d
/unmanaged_files/name=/etc/init.d/rc5.d
/unmanaged_files/name=/etc/init.d/rc6.d
/unmanaged_files/name=/etc/init.d/rcS.d
/unmanaged_files/name=/var/lib/dpkg
/unmanaged_files/name=/var/spool
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
thardeck commented 7 years ago

Hm, when I enter machinery show --verbose sapdisk, I get several sections, but no filters or directory lists.

The unmanaged-files were not inspected successfully yet, that's why no filters for them were stored in the description and shown with --verbose.

Did the inspection work fine with the --skip-files option or did it crash too?

wernerflamme commented 7 years ago

It still runs... It caused a load of 55+ on the inspected host, which became unresponsive, so I had to kill the machinery processes. I started again, and I excluded the second (100 GB) and third (1 TB) file system, so it should care about the / filesystem (100 GB) only.

wernerflamme commented 7 years ago

And just in this moment it crashes again:

machinery inspect -x sapdisk --skip-files=/var/spool/,/tmp,/var/tmp,/zdisk/data100,/srv/nfs4/ersatz --verbose
Inspecting sapdisk for os, packages, patterns, repositories, users, groups, services, changed-config-files, changed-managed-files, unmanaged-files...

The following filters are applied during inspection:
/unmanaged_files/name=/etc/passwd
/unmanaged_files/name=/etc/shadow
/unmanaged_files/name=/etc/group
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/lost+found
/unmanaged_files/name=/var/run
/unmanaged_files/name=/var/lock
/unmanaged_files/name=/var/lib/rpm
/unmanaged_files/name=/.snapshots
/unmanaged_files/name=/proc
/unmanaged_files/name=/etc/init.d/boot.d
/unmanaged_files/name=/etc/init.d/rc0.d
/unmanaged_files/name=/etc/init.d/rc1.d
/unmanaged_files/name=/etc/init.d/rc2.d
/unmanaged_files/name=/etc/init.d/rc3.d
/unmanaged_files/name=/etc/init.d/rc4.d
/unmanaged_files/name=/etc/init.d/rc5.d
/unmanaged_files/name=/etc/init.d/rc6.d
/unmanaged_files/name=/etc/init.d/rcS.d
/unmanaged_files/name=/var/lib/dpkg
/unmanaged_files/name=/var/spool
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/zdisk/data100
/unmanaged_files/name=/srv/nfs4/ersatz

Inspecting os...
 -> Found operating system 'SUSE Linux Enterprise Server 11' version '11 SP4'.
Inspecting packages...
 -> Found 1873 packages.
Inspecting patterns...
 -> Found 24 patterns.
Inspecting repositories...
 -> Found 79 repositories.
Inspecting users...
 -> Found 77 users.
Inspecting groups...
 -> Found 81 groups.
Inspecting services...
 -> Found 175 services.
Inspecting changed-config-files...
 -> Extracted 158 changed configuration files.
Inspecting changed-managed-files...
 -> Extracted 105 changed managed files.
Inspecting unmanaged-files...
2016/11/09 16:09:10 open /var/spool/cups/c105638: no such file or directory
Machinery experienced an unexpected error. Please file a bug report at: https://github.com/SUSE/machinery/issues/new
Execution of "ssh root@sapdisk -o LogLevel\=ERROR LANGUAGE\= LC_ALL\=en_US.utf8 /root/machinery-helper --extract-metadata" failed with status 1 (error output streamed away).

Backtrace:
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:555:in `check_errors'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:364:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/logged_cheetah.rb:23:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/remote_system.rb:92:in `run_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/machinery_helper.rb:60:in `run_helper'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:85:in `run_helper_inspection'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:59:in `inspect'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:87:in `block in build_description'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `each'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `build_description'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:21:in `inspect_system'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/cli.rb:640:in `block (2 levels) in <class:Cli>'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `call'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `execute'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:296:in `block in call_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:83:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bin/machinery:41:in `<top (required)>'
/usr/bin/machinery:24:in `load'
/usr/bin/machinery:24:in `<main>'

So, I gave /var/spool in the --skip-files parameter, and it crashes because of a vanished file /var/spool/cups/c105638. Now I also consider this as a bug.

thardeck commented 7 years ago

Thx for the update. Yes, this is a second issue. For the first we already have an entry under #2188 that's why I have changed the title of this issue accordingly.

thardeck commented 7 years ago

I was able to reproduce the issue and created a work around for it. I will let you know as soon as this is released.

To reproduce the issue you can run the following command inside a managed repository while inspecting unmanaged-files with extraction:

for i in `seq 1 600`; do touch "$i"; sleep 1; rm "$i"; done
thardeck commented 7 years ago

We have released version 1.22.2, which should fix your issue, at least when you use filtering. Could you verify it?

wernerflamme commented 7 years ago

It does not take so long to break machinery now :(

$ machinery inspect -x sapdisk --skip-files=/var/spool/,/tmp,/var/tmp,/zdisk/data100,/srv/nfs4/ersatz --verbose
Inspecting sapdisk for os, packages, patterns, repositories, users, groups, services, changed-config-files, changed-managed-files, unmanaged-files...

The following filters are applied during inspection:
/unmanaged_files/name=/etc/passwd
/unmanaged_files/name=/etc/shadow
/unmanaged_files/name=/etc/group
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/lost+found
/unmanaged_files/name=/var/run
/unmanaged_files/name=/var/lock
/unmanaged_files/name=/var/lib/rpm
/unmanaged_files/name=/.snapshots
/unmanaged_files/name=/proc
/unmanaged_files/name=/etc/init.d/boot.d
/unmanaged_files/name=/etc/init.d/rc0.d
/unmanaged_files/name=/etc/init.d/rc1.d
/unmanaged_files/name=/etc/init.d/rc2.d
/unmanaged_files/name=/etc/init.d/rc3.d
/unmanaged_files/name=/etc/init.d/rc4.d
/unmanaged_files/name=/etc/init.d/rc5.d
/unmanaged_files/name=/etc/init.d/rc6.d
/unmanaged_files/name=/etc/init.d/rcS.d
/unmanaged_files/name=/var/lib/dpkg
/unmanaged_files/name=/var/spool
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/zdisk/data100
/unmanaged_files/name=/srv/nfs4/ersatz

Inspecting os...
 -> Found operating system 'SUSE Linux Enterprise Server 11' version '11 SP4'.
Inspecting packages...
 -> Found 1872 packages.
Inspecting patterns...
 -> Found 24 patterns.
Inspecting repositories...
Machinery experienced an unexpected error. Please file a bug report at: https://github.com/SUSE/machinery/issues/new
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:158:in `block in get_credentials_from_system': undefined method `[]' for nil:NilClass (NoMethodError)
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:152:in `each'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:152:in `get_credentials_from_system'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:73:in `inspect_zypp_repositories'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:28:in `inspect'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:87:in `block in build_description'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:79:in `each'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:79:in `build_description'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:21:in `inspect_system'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/cli.rb:768:in `block (2 levels) in <class:Cli>'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `call'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `execute'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:296:in `block in call_command'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call_command'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:83:in `run'
        from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bin/machinery:41:in `<top (required)>'
        from /usr/bin/machinery:24:in `load'
        from /usr/bin/machinery:24:in `<main>'
$ rpm -q machinery
machinery-1.22.2-16.1.x86_64
thardeck commented 7 years ago

Thx for the report. I could not reproduce the issue yet but will look into it.

Until then can you try if the unmanaged-file issue is fixed by running:

machinery inspect -x -s unmanaged-files sapdisk --skip-files=/var/spool/,/zdisk/data100,/srv/nfs4/ersatz --verbose

You could also try to remove some of the filters, but there still might be an issue during extraction if you do not filter the volatile directories. But with filtering it should not crash anymore.

To inspect everything but repositories you can use the -e parameter.

machinery inspect -x -e repositories sapdisk --skip-files=/var/spool/,/zdisk/data100,/srv/nfs4/ersatz --verbose
thardeck commented 7 years ago

Is it possible that you have added new repositories in the meantime because I do not think that we touched this area of the code, at least not directly? But I know which area breaks so I can create a work around.

Do you have the same issue with an older version?

wernerflamme commented 7 years ago

I switched from the machinery rpm in the Leap Update 42.1 repo to the systemsmanagement:machinery repo, since this was the only way I found an rpm file for 1.22.2. Alas, this version again ignores the filters - with lsof /srv/nfs4/ersatz I see machinery accessing files on this filesystem:

machinery 17264 root    3r   DIR   8,17       48 2262328561 /srv/nfs4/ersatz/vmbackup/sapxufq/backup.1/var/lib/autoinstall/repository

It doesn't crash (yet), but it still ignores the filters.

BTW, is there any chance to start machinery on the inspected host with an ionice -c3 prefix? My poor sapdisk host is suffering from machinery scanning the 1 TB filesystem containing bazillions of small files...

thardeck commented 7 years ago

Yes, the helper binary does not support filters yet, we just fixed the crashes in case of removed/inaccessible files. The result is then filtered later on.

Using a lower priority does make sense I guess from the start but you should be able to change it manually if machinery-helper takes a long time to run.

wernerflamme commented 7 years ago

Oh, I see. Bad luck then, I guess.

If there is no chance to filter from scanning, machinery is unusable for me. The 1 TB volume is shared via NFS among all of my hosts. sapdisk is the NFS server (hence its hostname). The volumes takes hours to scan, maybe a backup comes in during inspection, and hundreds of files are removed and brought back (all in the vmbackup tree). It makes definitley no sense to scan files here.

I know about the renice command, but how can I apply ionice -c3 to a running program? AFAIK renice only cares about the "normal" nice value, which doesn't harm - but the I/O load is very high. Do I have to patch machinery on my work host instead of intervening on the inspected host?

In the comparable case of the seccheck script, I patch it manually: first, the device is removed from the device list, and then the cron job gets an ionice -c3 instead of the nice command that is prepended by default.

thardeck commented 7 years ago

Thx for your debugging.

Can't you get the process id of the machinery-helper on the inspected host with ps and run ionice -c 3 <pid>?

Anyway I will create a new issue for the machinery-helper filter support. The crash at least should be fixed by the patch release.

Regarding the repository issue, can you create a new issue, and mention there if you added new repositories and if not if the inspection works with the current Leap version?

Thx in advance.

thardeck commented 7 years ago

If there is no chance to filter from scanning, machinery is unusable for me. The 1 TB volume is shared via NFS among all of my hosts. sapdisk is the NFS server (hence its hostname). The volumes takes hours to scan, maybe a backup comes in during inspection, and hundreds of files are removed and brought back (all in the vmbackup tree). It makes definitley no sense to scan files here.

One thing that wonders me is that by default machinery-helper should not inspect remote mount points. How is the NFS disk mounted and what does mount report as filesystem?

wernerflamme commented 7 years ago

Ah, OK, didn't test that, just thought that "since the helper binary does not support filters yet" means it scans all available filesystems, but it does so with local fs only. The fs is mounted via automount/NFSv3 on ~20 hosts. So, the problem with the gigantic unneeded filescan is on my central host sapdisk only. Sounds better now :)

I'll try ionice -c3 on an existing pid tomorrow, have to go to a meeting now. Also tomorrow, I'll create a new issue regarding the versions I use(d).

thardeck commented 7 years ago

This issue can be closed I guess. I have already created a new issue for the filter support for machinery-helper.