Closed wernerflamme closed 7 years ago
I started this again and get nearly the same error again, but this time it is "open /var/spool/cups/c105580: no such file or directory". I guess that temporary files make machinery crash. But how could I possibly inspect a running host without having temporary files?
Thx for your report. So the issue seems that the files are there during inspection but get removed before extraction.
For now you could either inspect without extraction or filter the directories by using using --skip-files
, for example by running machinery inspect -x sapdisk --skip-files=/var/spool/cups
.
The only thing that wonders is me how /tmp
could have ended up in the list, since we filter it by default.
If you want to see the filters which were applied during inspection you can run machinery show --verbose sapdisk
Can you check if /tmp
is mentioned in this list, the one under The following filters were applied during inspection:
?
Hm, when I enter machinery show --verbose sapdisk
, I get several sections, but no filters or directory lists. The sections are titled "Operating System", "Packages", "Patterns", "Repositories", "Users", "Groups", "Services", "Changed Configuration Files", "Changed Managed Files" - plus the headline "unmanaged-files", but this is empty, since machinery always breaks here.
When I enter machinery inspect -x sapdisk --skip-files=/var/spool/,/tmp,/var/tmp --verbose
, the output starts with a filter list:
/unmanaged_files/name=/etc/passwd
/unmanaged_files/name=/etc/shadow
/unmanaged_files/name=/etc/group
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/lost+found
/unmanaged_files/name=/var/run
/unmanaged_files/name=/var/lock
/unmanaged_files/name=/var/lib/rpm
/unmanaged_files/name=/.snapshots
/unmanaged_files/name=/proc
/unmanaged_files/name=/etc/init.d/boot.d
/unmanaged_files/name=/etc/init.d/rc0.d
/unmanaged_files/name=/etc/init.d/rc1.d
/unmanaged_files/name=/etc/init.d/rc2.d
/unmanaged_files/name=/etc/init.d/rc3.d
/unmanaged_files/name=/etc/init.d/rc4.d
/unmanaged_files/name=/etc/init.d/rc5.d
/unmanaged_files/name=/etc/init.d/rc6.d
/unmanaged_files/name=/etc/init.d/rcS.d
/unmanaged_files/name=/var/lib/dpkg
/unmanaged_files/name=/var/spool
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
Hm, when I enter machinery show --verbose sapdisk, I get several sections, but no filters or directory lists.
The unmanaged-files
were not inspected successfully yet, that's why no filters for them were stored in the description and shown with --verbose
.
Did the inspection work fine with the --skip-files
option or did it crash too?
It still runs... It caused a load of 55+ on the inspected host, which became unresponsive, so I had to kill the machinery processes. I started again, and I excluded the second (100 GB) and third (1 TB) file system, so it should care about the / filesystem (100 GB) only.
And just in this moment it crashes again:
machinery inspect -x sapdisk --skip-files=/var/spool/,/tmp,/var/tmp,/zdisk/data100,/srv/nfs4/ersatz --verbose
Inspecting sapdisk for os, packages, patterns, repositories, users, groups, services, changed-config-files, changed-managed-files, unmanaged-files...
The following filters are applied during inspection:
/unmanaged_files/name=/etc/passwd
/unmanaged_files/name=/etc/shadow
/unmanaged_files/name=/etc/group
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/lost+found
/unmanaged_files/name=/var/run
/unmanaged_files/name=/var/lock
/unmanaged_files/name=/var/lib/rpm
/unmanaged_files/name=/.snapshots
/unmanaged_files/name=/proc
/unmanaged_files/name=/etc/init.d/boot.d
/unmanaged_files/name=/etc/init.d/rc0.d
/unmanaged_files/name=/etc/init.d/rc1.d
/unmanaged_files/name=/etc/init.d/rc2.d
/unmanaged_files/name=/etc/init.d/rc3.d
/unmanaged_files/name=/etc/init.d/rc4.d
/unmanaged_files/name=/etc/init.d/rc5.d
/unmanaged_files/name=/etc/init.d/rc6.d
/unmanaged_files/name=/etc/init.d/rcS.d
/unmanaged_files/name=/var/lib/dpkg
/unmanaged_files/name=/var/spool
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/zdisk/data100
/unmanaged_files/name=/srv/nfs4/ersatz
Inspecting os...
-> Found operating system 'SUSE Linux Enterprise Server 11' version '11 SP4'.
Inspecting packages...
-> Found 1873 packages.
Inspecting patterns...
-> Found 24 patterns.
Inspecting repositories...
-> Found 79 repositories.
Inspecting users...
-> Found 77 users.
Inspecting groups...
-> Found 81 groups.
Inspecting services...
-> Found 175 services.
Inspecting changed-config-files...
-> Extracted 158 changed configuration files.
Inspecting changed-managed-files...
-> Extracted 105 changed managed files.
Inspecting unmanaged-files...
2016/11/09 16:09:10 open /var/spool/cups/c105638: no such file or directory
Machinery experienced an unexpected error. Please file a bug report at: https://github.com/SUSE/machinery/issues/new
Execution of "ssh root@sapdisk -o LogLevel\=ERROR LANGUAGE\= LC_ALL\=en_US.utf8 /root/machinery-helper --extract-metadata" failed with status 1 (error output streamed away).
Backtrace:
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:555:in `check_errors'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:364:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/logged_cheetah.rb:23:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/remote_system.rb:92:in `run_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/machinery_helper.rb:60:in `run_helper'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:85:in `run_helper_inspection'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:59:in `inspect'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:87:in `block in build_description'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `each'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `build_description'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:21:in `inspect_system'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/cli.rb:640:in `block (2 levels) in <class:Cli>'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `call'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `execute'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:296:in `block in call_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:83:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bin/machinery:41:in `<top (required)>'
/usr/bin/machinery:24:in `load'
/usr/bin/machinery:24:in `<main>'
So, I gave /var/spool
in the --skip-files
parameter, and it crashes because of a vanished file /var/spool/cups/c105638
. Now I also consider this as a bug.
Thx for the update. Yes, this is a second issue. For the first we already have an entry under #2188 that's why I have changed the title of this issue accordingly.
I was able to reproduce the issue and created a work around for it. I will let you know as soon as this is released.
To reproduce the issue you can run the following command inside a managed repository while inspecting unmanaged-files with extraction:
for i in `seq 1 600`; do touch "$i"; sleep 1; rm "$i"; done
We have released version 1.22.2, which should fix your issue, at least when you use filtering. Could you verify it?
It does not take so long to break machinery now :(
$ machinery inspect -x sapdisk --skip-files=/var/spool/,/tmp,/var/tmp,/zdisk/data100,/srv/nfs4/ersatz --verbose
Inspecting sapdisk for os, packages, patterns, repositories, users, groups, services, changed-config-files, changed-managed-files, unmanaged-files...
The following filters are applied during inspection:
/unmanaged_files/name=/etc/passwd
/unmanaged_files/name=/etc/shadow
/unmanaged_files/name=/etc/group
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/lost+found
/unmanaged_files/name=/var/run
/unmanaged_files/name=/var/lock
/unmanaged_files/name=/var/lib/rpm
/unmanaged_files/name=/.snapshots
/unmanaged_files/name=/proc
/unmanaged_files/name=/etc/init.d/boot.d
/unmanaged_files/name=/etc/init.d/rc0.d
/unmanaged_files/name=/etc/init.d/rc1.d
/unmanaged_files/name=/etc/init.d/rc2.d
/unmanaged_files/name=/etc/init.d/rc3.d
/unmanaged_files/name=/etc/init.d/rc4.d
/unmanaged_files/name=/etc/init.d/rc5.d
/unmanaged_files/name=/etc/init.d/rc6.d
/unmanaged_files/name=/etc/init.d/rcS.d
/unmanaged_files/name=/var/lib/dpkg
/unmanaged_files/name=/var/spool
/unmanaged_files/name=/tmp
/unmanaged_files/name=/var/tmp
/unmanaged_files/name=/zdisk/data100
/unmanaged_files/name=/srv/nfs4/ersatz
Inspecting os...
-> Found operating system 'SUSE Linux Enterprise Server 11' version '11 SP4'.
Inspecting packages...
-> Found 1872 packages.
Inspecting patterns...
-> Found 24 patterns.
Inspecting repositories...
Machinery experienced an unexpected error. Please file a bug report at: https://github.com/SUSE/machinery/issues/new
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:158:in `block in get_credentials_from_system': undefined method `[]' for nil:NilClass (NoMethodError)
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:152:in `each'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:152:in `get_credentials_from_system'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:73:in `inspect_zypp_repositories'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/plugins/repositories/repositories_inspector.rb:28:in `inspect'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:87:in `block in build_description'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:79:in `each'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:79:in `build_description'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/inspect_task.rb:21:in `inspect_system'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/lib/cli.rb:768:in `block (2 levels) in <class:Cli>'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `call'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `execute'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:296:in `block in call_command'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call_command'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:83:in `run'
from /usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.2/bin/machinery:41:in `<top (required)>'
from /usr/bin/machinery:24:in `load'
from /usr/bin/machinery:24:in `<main>'
$ rpm -q machinery
machinery-1.22.2-16.1.x86_64
Thx for the report. I could not reproduce the issue yet but will look into it.
Until then can you try if the unmanaged-file issue is fixed by running:
machinery inspect -x -s unmanaged-files sapdisk --skip-files=/var/spool/,/zdisk/data100,/srv/nfs4/ersatz --verbose
You could also try to remove some of the filters, but there still might be an issue during extraction if you do not filter the volatile directories. But with filtering it should not crash anymore.
To inspect everything but repositories you can use the -e
parameter.
machinery inspect -x -e repositories sapdisk --skip-files=/var/spool/,/zdisk/data100,/srv/nfs4/ersatz --verbose
Is it possible that you have added new repositories in the meantime because I do not think that we touched this area of the code, at least not directly? But I know which area breaks so I can create a work around.
Do you have the same issue with an older version?
I switched from the machinery rpm in the Leap Update 42.1 repo to the systemsmanagement:machinery repo, since this was the only way I found an rpm file for 1.22.2. Alas, this version again ignores the filters - with lsof /srv/nfs4/ersatz
I see machinery accessing files on this filesystem:
machinery 17264 root 3r DIR 8,17 48 2262328561 /srv/nfs4/ersatz/vmbackup/sapxufq/backup.1/var/lib/autoinstall/repository
It doesn't crash (yet), but it still ignores the filters.
BTW, is there any chance to start machinery on the inspected host with an ionice -c3
prefix? My poor sapdisk host is suffering from machinery scanning the 1 TB filesystem containing bazillions of small files...
Yes, the helper binary does not support filters yet, we just fixed the crashes in case of removed/inaccessible files. The result is then filtered later on.
Using a lower priority does make sense I guess from the start but you should be able to change it manually if machinery-helper
takes a long time to run.
Oh, I see. Bad luck then, I guess.
If there is no chance to filter from scanning, machinery is unusable for me. The 1 TB volume is shared via NFS among all of my hosts. sapdisk is the NFS server (hence its hostname). The volumes takes hours to scan, maybe a backup comes in during inspection, and hundreds of files are removed and brought back (all in the vmbackup
tree). It makes definitley no sense to scan files here.
I know about the renice
command, but how can I apply ionice -c3
to a running program? AFAIK renice only cares about the "normal" nice value, which doesn't harm - but the I/O load is very high. Do I have to patch machinery on my work host instead of intervening on the inspected host?
In the comparable case of the seccheck script, I patch it manually: first, the device is removed from the device list, and then the cron job gets an ionice -c3
instead of the nice
command that is prepended by default.
Thx for your debugging.
Can't you get the process id of the machinery-helper on the inspected host with ps
and run ionice -c 3 <pid>
?
Anyway I will create a new issue for the machinery-helper filter support. The crash at least should be fixed by the patch release.
Regarding the repository issue, can you create a new issue, and mention there if you added new repositories and if not if the inspection works with the current Leap version?
Thx in advance.
If there is no chance to filter from scanning, machinery is unusable for me. The 1 TB volume is shared via NFS among all of my hosts. sapdisk is the NFS server (hence its hostname). The volumes takes hours to scan, maybe a backup comes in during inspection, and hundreds of files are removed and brought back (all in the vmbackup tree). It makes definitley no sense to scan files here.
One thing that wonders me is that by default machinery-helper
should not inspect remote mount points.
How is the NFS disk mounted and what does mount report as filesystem?
Ah, OK, didn't test that, just thought that "since the helper binary does not support filters yet" means it scans all available filesystems, but it does so with local fs only. The fs is mounted via automount/NFSv3 on ~20 hosts. So, the problem with the gigantic unneeded filescan is on my central host sapdisk only. Sounds better now :)
I'll try ionice -c3
on an existing pid tomorrow, have to go to a meeting now. Also tomorrow, I'll create a new issue regarding the versions I use(d).
This issue can be closed I guess. I have already created a new issue for the filter support for machinery-helper.
machinery inspect -x sapdisk
Inspecting sapdisk for os, packages, patterns, repositories, users, groups, services, changed-config-files, changed-managed-files, unmanaged-files...
Note: There are filters being applied during inspection. (Use
--verbose
option to show the filters)Inspecting os... -> Found operating system 'SUSE Linux Enterprise Server 11' version '11 SP4'. Inspecting packages... -> Found 1879 packages. Inspecting patterns... -> Found 24 patterns. Inspecting repositories... -> Found 79 repositories. Inspecting users... -> Found 77 users. Inspecting groups... -> Found 81 groups. Inspecting services... -> Found 175 services. Inspecting changed-config-files... -> Extracted 158 changed configuration files. Inspecting changed-managed-files... -> Extracted 105 changed managed files. Inspecting unmanaged-files... 2016/11/07 15:43:34 open /tmp/.security.xl6yCR/: no such file or directory Machinery experienced an unexpected error. Please file a bug report at: https://github.com/SUSE/machinery/issues/new Execution of "ssh root@sapdisk -o LogLevel\=ERROR LANGUAGE\= LC_ALL\=en_US.utf8 /root/machinery-helper --extract-metadata" failed with status 1 (error output streamed away). `Backtrace:
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:555:in `check_errors'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/cheetah-0.4.0/lib/cheetah.rb:364:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/logged_cheetah.rb:23:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/remote_system.rb:92:in `run_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/machinery_helper.rb:60:in `run_helper'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:85:in `run_helper_inspection'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/plugins/unmanaged_files/unmanaged_files_inspector.rb:59:in `inspect'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:87:in `block in build_description'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `each'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:79:in `build_description'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/inspect_task.rb:21:in `inspect_system'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/lib/cli.rb:640:in `block (2 levels) in'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `call'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/command_support.rb:126:in `execute'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:296:in `block in call_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:309:in `call_command'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bundle/ruby/2.1.0/gems/gli-2.13.1/lib/gli/app_support.rb:83:in `run'
/usr/lib64/ruby/gems/2.1.0/gems/machinery-tool-1.22.0/bin/machinery:41:in `<top (required)>'
/usr/bin/machinery:24:in `load'
/usr/bin/machinery:24:in `'