RexOps / Rex

Rex, the friendly automation framework
https://www.rexify.org
716 stars 224 forks source link

`Rex::Commands::Fs::df` fails on `tcsh` #1529

Open rwp0 opened 2 years ago

rwp0 commented 2 years ago

Describe the bug

df actually returns empty hash reference with or without a device argument in FreeBSD 13.1 (latest) release

This function returns a hash reference which reflects the output of df.

https://metacpan.org/pod/Rex::Commands::Fs#df([$device])

Expected behavior

hashref reflecting df(1) command's output

How to reproduce it

task 'disk', group => 'testing', sub {
    say Dumper df;
};

$VAR1 = {};

Code example

No response

Additional context

No response

Rex version

1.13.4

Perl version

5.36.0

Operating system running rex

Debian 11

Operating system managed by rex

FreeBSD

How rex was installed?

cpan client

ferki commented 2 years ago

Thanks for the report, @rwp0! Could you attach a debug output from rex -d as a file, please?

In case you'd like to take a closer look at what might be going wrong meanwhile: Rex runs df $dev, and takes the lines from STDOUT, then parses the lines.

rwp0 commented 2 years ago

@ferki thanks for mentioning the -d option.

It seems df (without arguments) looks for a device resulting in df: 2: No such file or directory under FreeBSD using tcsh.

Short:

[2022-06-12 17:34:41] DEBUG - ========= ERR ============
[2022-06-12 17:34:41] DEBUG - df: 2: No such file or directory

[2022-06-12 17:34:41] DEBUG - ========= ERR ============
$VAR1 = {};

Long:

/usr/bin/perlbrew exec -q --with perl-5.36.0 /home/regular/perl5/perlbrew/perls/perl-5.36.0/bin/perl /usr/bin/rex -d -H HOST df
[2022-06-12 17:34:31] DEBUG - This is Rex version: 1.13.4
[2022-06-12 17:34:31] DEBUG - Command Line Parameters
[2022-06-12 17:34:31] DEBUG -   H = HOST
[2022-06-12 17:34:31] DEBUG -   d = 1
[2022-06-12 17:34:31] DEBUG - Creating lock-file (Rexfile.lock)
[2022-06-12 17:34:31] DEBUG - Loading Rexfile
[2022-06-12 17:34:31] DEBUG - Enabling task_chaining_cmdline_args feature
[2022-06-12 17:34:31] DEBUG - Activating new template engine.
[2022-06-12 17:34:31] DEBUG - Disabling usage of a tty
[2022-06-12 17:34:31] DEBUG - Activating autodie.
[2022-06-12 17:34:31] DEBUG - Using Net::OpenSSH if present.
[2022-06-12 17:34:31] DEBUG - Add service check.
[2022-06-12 17:34:31] DEBUG - Setting set() to not append data.
[2022-06-12 17:34:31] DEBUG - Registering CMDB as template variables.
[2022-06-12 17:34:31] DEBUG - activating featureset >= 0.51
[2022-06-12 17:34:31] DEBUG - activating featureset >= 0.40
[2022-06-12 17:34:31] DEBUG - activating featureset >= 0.35
[2022-06-12 17:34:31] DEBUG - activating featureset >= 0.31
[2022-06-12 17:34:31] DEBUG - Creating new distribution class of type: Base
[2022-06-12 17:34:31] DEBUG - new distribution class of type Rex::TaskList::Base created.
[2022-06-12 17:34:31] DEBUG - Creating task: local-df
[2022-06-12 17:34:31] DEBUG -   server: HOST
[2022-06-12 17:34:31] DEBUG - Returning existing distribution class of type: Rex::TaskList::Base
[2022-06-12 17:34:31] DEBUG - Creating task: df
[2022-06-12 17:34:31] DEBUG -   server: HOST
[2022-06-12 17:34:31] DEBUG - Initializing Logger from parameters found in Rexfile
[2022-06-12 17:34:31] DEBUG - Returning existing distribution class of type: Rex::TaskList::Base
[2022-06-12 17:34:31] DEBUG - Returning existing distribution class of type: Rex::TaskList::Base
[2022-06-12 17:34:31] INFO - Running task df on HOST
[2022-06-12 17:34:31] DEBUG - Rex::Group::Entry::Server (private_key): returning
[2022-06-12 17:34:31] DEBUG - Rex::Group::Entry::Server (public_key): returning
[2022-06-12 17:34:31] DEBUG - $VAR1 = '';

[2022-06-12 17:34:31] DEBUG - Auth-Information inside Task:
[2022-06-12 17:34:31] DEBUG - private_key => []
[2022-06-12 17:34:31] DEBUG - sudo_password => [[**********]]
[2022-06-12 17:34:31] DEBUG - port => [[22]]
[2022-06-12 17:34:31] DEBUG - sudo => [[]]
[2022-06-12 17:34:31] DEBUG - password => [[%s]]
[2022-06-12 17:34:31] DEBUG - public_key => [[]]
[2022-06-12 17:34:31] DEBUG - user => [[root]]
[2022-06-12 17:34:31] DEBUG - auth_type => [[key]]
[2022-06-12 17:34:31] DEBUG - Using Net::OpenSSH for connection
[2022-06-12 17:34:31] DEBUG - Using user: root
[2022-06-12 17:34:31] DEBUG - Connecting to HOST:22 (root)
[2022-06-12 17:34:31] DEBUG - get_openssh_opt()
[2022-06-12 17:34:31] DEBUG - $VAR1 = {
          'StrictHostKeyChecking' => 'no'
        };

[2022-06-12 17:34:31] DEBUG - OpenSSH: key_auth or not defined: HOST:22 - root
[2022-06-12 17:34:31] DEBUG - OpenSSH options:
[2022-06-12 17:34:31] DEBUG - $VAR1 = [
          'HOST',
          'user',
          'root',
          'port',
          22,
          'master_opts',
          [
            '-o',
            'ConnectTimeout=2',
            '-o',
            'LogLevel=QUIET',
            '-o',
            'StrictHostKeyChecking=no'
          ],
          'default_ssh_opts',
          $VAR1->[6]
        ];

[2022-06-12 17:34:31] DEBUG - OpenSSH constructor options:
[2022-06-12 17:34:31] DEBUG - $VAR1 = {};

[2022-06-12 17:34:31] DEBUG - Trying following auth types:
[2022-06-12 17:34:31] DEBUG - $VAR1 = [
          'key',
          'pass'
        ];

[2022-06-12 17:34:37] DEBUG - Current Error-Code: 0
[2022-06-12 17:34:37] DEBUG - Connected and authenticated to HOST.
[2022-06-12 17:34:37] DEBUG - Successfully authenticated on HOST.
[2022-06-12 17:34:37] DEBUG - Executing: perl -MFile::Spec -le 'print File::Spec->tmpdir'
[2022-06-12 17:34:37] DEBUG - Detecting shell...
[2022-06-12 17:34:37] DEBUG - Searching for shell: zsh
[2022-06-12 17:34:38] DEBUG - Searching for shell: ksh
[2022-06-12 17:34:38] DEBUG - Searching for shell: bash
[2022-06-12 17:34:38] DEBUG - Searching for shell: idrac
[2022-06-12 17:34:38] DEBUG - Rex::Interface::Exec::OpenSSH/executing: version
[2022-06-12 17:34:39] DEBUG - ========= ERR ============
[2022-06-12 17:34:39] DEBUG - version: Command not found.

[2022-06-12 17:34:39] DEBUG - ========= ERR ============
[2022-06-12 17:34:39] DEBUG - Searching for shell: ash
[2022-06-12 17:34:39] DEBUG - Searching for shell: csh
[2022-06-12 17:34:40] DEBUG - Searching for shell: sh
[2022-06-12 17:34:40] DEBUG - Searching for shell: tcsh
[2022-06-12 17:34:40] DEBUG - Found shell and using: tcsh
[2022-06-12 17:34:40] DEBUG - SSH/executing: set LC_ALL=C ; perl -MFile::Spec -le 'print File::Spec->tmpdir'
[2022-06-12 17:34:41] DEBUG - /tmp

[2022-06-12 17:34:41] DEBUG - Executing: which perl
[2022-06-12 17:34:41] DEBUG - Detecting shell...
[2022-06-12 17:34:41] DEBUG - Found shell in cache: tcsh
[2022-06-12 17:34:41] DEBUG - SSH/executing: set LC_ALL=C ; set PATH=/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin:/usr/pkg/bin:/usr/pkg/sbin; which perl
[2022-06-12 17:34:41] DEBUG - /usr/bin/perl

[2022-06-12 17:34:41] DEBUG - Executing df
[2022-06-12 17:34:41] DEBUG - Executing: df  2>/dev/null
[2022-06-12 17:34:41] DEBUG - Detecting shell...
[2022-06-12 17:34:41] DEBUG - Found shell in cache: tcsh
[2022-06-12 17:34:41] DEBUG - SSH/executing: set LC_ALL=C ; df  2>/dev/null
[2022-06-12 17:34:41] DEBUG - ========= ERR ============
[2022-06-12 17:34:41] DEBUG - df: 2: No such file or directory

[2022-06-12 17:34:41] DEBUG - ========= ERR ============
$VAR1 = {};

[2022-06-12 17:34:41] DEBUG - Running postponed notifications.
[2022-06-12 17:34:41] DEBUG - Connections in queue: 1
[2022-06-12 17:34:41] DEBUG - Destroying all cached os information
[2022-06-12 17:34:41] DEBUG - Waiting for children to finish
[2022-06-12 17:34:41] DEBUG - Need to reinitialize connections.
[2022-06-12 17:34:41] DEBUG - Returning existing distribution class of type: Rex::TaskList::Base
[2022-06-12 17:34:41] INFO - All tasks successful on all hosts
[2022-06-12 17:34:41] DEBUG - Removing lockfile
[2022-06-12 17:34:42] DEBUG - Returning existing distribution class of type: Rex::TaskList::Base

Process finished with exit code 0
ferki commented 2 years ago

Thanks for the debug logs, and for nicely formatting them as code! :heart: The full output can get quite long, so that might be easier to attach as file next time :wink:

Debug log analysis

As you found in the debug log correctly, the part leading to the error is:

The debug log also shows that rex found tcsh as a shell, and the problem seems to be that 2>/dev/null doesn't work on tcsh to redirect STDERR to /dev/null, but 2 is being passed to the df command as first argument.

In that sense it's related more to tcsh than to FreeBSD. I installed tcsh on Gentoo for a quick test and I can reproduce the error :tada: It might also worth to clarify in the issue title later for future readers.

Possible workarounds

Possible fix

There are a few code smells around the current execution logic:

It's probably best to simplify that, and might also be possible to either:

Risks

rwp0 commented 2 years ago

AFAIK GitHub doesn't support attaching files natively, so pastebin services could be used/suggested for that purpose

(t)csh is (and always was IIRC) the default shell in FreeBSD so the Rex df command won't work on any version/flavor of FreeBSD as well because of that

I completely agree with all your points, and I appreciate your very much detailed response, prompt reaction, and the label being added 🙂

The debug log also shows that rex found tcsh as a shell, and the problem seems to be that 2>/dev/null doesn't work on tcsh to redirect STDERR to /dev/null, but 2 is being passed to the df command as first argument.

Unfortunately tcsh doesn't support 2> operator, and that's how standard error can be redirected^1 in tcsh:

$ (ls -l /bin/bash /bing/other  > out) > & err
(direct stdout to 'out' and stderr to 'err')
ferki commented 2 years ago

Pasting/attaching

AFAIK GitHub doesn't support attaching files naively, so pastebin services could be used/suggested for that purpose

Pastebin services work too, though there's usually less control over their long-time availability, and using them also means scattering information-that-belongs-together across services. A GitHub gist might be a good choice to "stay close", but in general I prefer Attaching files directly to the issue to preserve the information long-term in a single place. (And then take an offline backup with something like git-bug or similar, but I digress).

tcsh

(t)csh is (and always was IIRC) the default shell in FreeBSD so the Rex df command won't work on any version/flavor of FreeBSD as well because of that

Yeah, I don't think I've encountered tcsh before elsewhere :) There's quite a few FreeBSD users and testers of Rex, so I'm curious why this issue didn't surface earlier, though :thinking:

We can think about introducing more test coverage perhaps (at least via CPAN testers, there's no native BSD option for GitHub Actions as far as I know, but there might some workarounds for that).

Details

I completely agree with all your points, and I appreciate your very much detailed response, prompt reaction, and the label being added

I'm glad you appreciate the detail. I tend to dump my thoughts as I go as reference for future readers (often myself :rofl:), and that often ends up highly detailed. I guess it's a case of "I didn't have time to write a shorter answer" :upside_down_face:

Redirect with tcsh

Unfortunately tcsh doesn't support 2> operator, and that's how standard error can be redirected1 in tcsh:

$ (ls -l /bin/bash /bing/other  > out) > & err
(direct stdout to 'out' and stderr to 'err')
  1. https://code-maven.com/slides/linux/redirection-tcsh leftwards_arrow_with_hook

Yeah, that's the info I found too; thanks for confirming it and for the tips, and nice to see a code-maven link about it :+1:

Big picture

For the big picture I have these in mind currently:

  1. Since other parts of rex might be affected by the same issue, I'd like to go over these cases from the "redirect on tcsh" perspective, and see if there's a need and chance for a general fix.
  2. To avoid introducing potential regressions, I'd also check git history of df() about what situations made previous authors to end up with this exact approach here.
  3. I wonder if there's a good way to test the related behavior somehow, especially without having to actually run something on (Free)BSD, and prevent similar issues entering the code :thinking:

It might take some time until someone goes through all that, though.

Small picture :)

For this specific code part, I'd probably go for just dropping the STDERR handling, since that is ignored in the context of df() anyway. That is, no redirect, and drop the $err part completely if possible.

If df on tcsh is a bottleneck for your use case, it could be relatively easy to create an intermediate patch to apply to this limited scope (or perhaps even merge it, depending how hacky the approach is).