$shh("cat -> file",$data) hangs on large amount of data

briandfoy commented 1 year ago

This ticket was imported from rt.cpan.org 7910

Requestor: bwalz@paradigm-healthcare.com
Date: Wed Oct 06 17:41:45 2004
Original subject: $shh("cat -> file",$data) hangs on large amount of data
Original link: rt.cpan.org 7910
Attachments:
- tst14.pl.txt

This probalbly is not a bug but a problem with my older version of openssh on the client machine but:

$ssh->cmd("cat ->/tmp/file.tmp", $alot_of_data);

will cause the process to hang when dumping more than say 6K of data.

The following code sample from pscp fixes the problem (file contents in $c):

# chop up file, seems to lock up on large chunks
my @chunks = $c =~ /.{1,6144}/gs;
my $f = 0;
foreach my $chunk (@chunks) {
  my $cmd = $f++?"cat - >>$tfile":"cat - >$tfile";
  my($out, $err, $exit) = $ssh->cmd($cmd, $chunk);
    die "Can't write file $tfile: $err" if $err;
}

Debug ouput including version numbers:

: Reading configuration data /home/bill/.ssh/config
: Reading configuration data /etc/ssh_config
: Connecting to xxxx.xxxx.com, port 22.
: Remote protocol version 1.99, remote software version OpenSSH_3.1p1
: Net::SSH::Perl Version 1.25, protocol version 2.0.
: No compat match: OpenSSH_3.1p1.
: Connection established.

briandfoy commented 1 year ago

from @flanneljeans

I've found the bug, and have fixed it as described below. I don't think this is the best way to fix it, but it did prove my theory on how the bug was occuring.

The problem occurs due to the interaction between drain_outgoing (in Channel.pm) and client_loop in (SSH2.pm).

If your $stdout is bigger than remote_maxpacket, drain_outgoing tries to repeatedly call client_loop until the length is reduced to zero. The problem is that client_loop will not return back to drain_outgoing, once it is called from inside the while loop the second time.

MY FIX: I reasoned that if I could force the client_loop to execute one-and-only-one-time, for each time it was called from the drain_outgoing while loop, there would be no problem. To test this, I did the following:

1.) drain_outgoing: added the following line before the while loop

$c->{ssh}->{DoOneLoop}=1;

2.) drain_outgoing: added the following line after the while loop

undef($c->{ssh}->{DoOneLoop});

3.) client_loop: added the following line just before the end of the first while loop

    last if($ssh->{DoOneLoop});

This seems to have fixed it for all scenarios that I've tested.

If you know of a better solution, or know how to get this into the official distribution, please email me: craig_at_lucent_dot_co

briandfoy commented 1 year ago

from dbrobins@davidrobins.net

On Monday August 15, 2005 07:14, Guest via RT wrote:

Full context and any attached attachments can be found at: <URL: https://rt.cpan.org/Ticket/Display.html?id=7910 >

...

If you know of a better solution, or know how to get this into the official distribution, please email me: craig_at_lucent_dot_com

Patches can be submitted to the list or to me (DBROBINS at cpan.org) directly.
Please include:

a summary of the problem being fixed (test cases, what goes wrong, what you expect to happen)
a gzipped unified diff of the changed file(s)

I'll review it, and either apply it (possibly with changes) or get back to you with suggested changes.

Thanks,

-- Dave Isa. 40:31

briandfoy commented 1 year ago

from jgilbert

[guest - Mon Aug 15 10:14:33 2005]:

I've found the bug, and have fixed it as described below. I don't think this is the best way to fix it, but it did prove my theory on how the bug was occuring. [excellent fix snip]

This seems to have fixed it for all scenarios that I've tested.

There are two scenarios where this is isn't totally fixed:

when the remote peer is running OpenSSH 3.7.1p3
when the remote peer is running Sun's deployed SSH identified by 'SSH-1.99-Sun_SSH_1.1'

In these cases, with the DoOneLoop fix, the select() call inside client_loop hangs indefinitely with a $stdin to cmd() that is greater than 32768B. More interestingly, it appears that data on STDIN is put onto the resulting channel which is then put into the fd on the remote peer.

We've implemented the above, with the following additional changes in SSH2.pm, Constants.pm, and Channel.pm:

diff /opt/rcs/lib/Net/SSH/Perl/Channel.pm
/opt/rcs/os_deployment/lib/Net/SSH/Perl/Channel.pm
191a192,193
>     ## COVD FIX:
>     $c->{ssh}->{DoOneLoop} = 10;
195a198,200
>     ## COVD FIX:
>     undef( $c->{ssh}->{DoOneLoop} );
>     delete $c->{ssh}->{DoOneLoop};

diff /opt/rcs/lib/Net/SSH/Perl/Constants.pm
/opt/rcs/os_deployment/lib/Net/SSH/Perl/Constants.pm
138c138
<     'MAX_PACKET_SIZE' => 256000,
---
>     'MAX_PACKET_SIZE' => 8192,

diff /opt/rcs/lib/Net/SSH/Perl/SSH2.pm /opt/rcs/os_deployment/lib/Net/SSH/Perl/SSH2.pm
302c303,306
<         my($rready, $wready) = $select_class->select($rb, $wb);
---
>         ## COVD FIX:
>         $ssh->debug("Instantiating a select with $ssh->{DoOneLoop} second timeout.")
>             if (exists $ssh->{DoOneLoop} && $ssh->{DoOneLoop} > 0);
>         my($rready, $wready) = $select_class->select($rb, $wb, undef,($ssh->{DoOneLoop} || undef));
313a318,319
>         ## COVD FIX:
>         last if ($ssh->{DoOneLoop});

This workaround appears to function correctly with any filesize for SSH implementations identifing themselves as SSH-1.99-OpenSSH_3.7.1p2, SSH-1.99-Sun_SSH_1.0.1, and SSH-1.99-Sun_SSH_1.1.

I'm pretty sure this shouldn't make it into an official patch; I'm just documenting this for anyone else bitten by this. ;)

-jgilbert.

briandfoy commented 1 year ago

from @flanneljeans

tst14.pl.txt

Folks-

Not only is this bug appearing on many different platforms, but the fix I proposed earlier (and the subsequent jgilbert fix) doesn't work on some of them with v1.30 (ppc-linux in particular).

I've written a perl script to cause the bug to occur, and identify just how many characters your system needs to trigger it. Can you try running it and see if you have the bug and post your results here? I've also sent this to Dave Robins to see if he can figure this but out once and for all.

My output is below. I'm attaching the perl test script.

Thanks -Craig

Net::SSH::Perl bug test...
host=nwsgpb user=watchmrk remoteCmd=cat
xferChars=932779|-Error, down to 466390 chars
        Received disconnect message: Corrupted MAC on input.
xferChars=466390|-Error, down to 233195 chars
        Received disconnect message: Corrupted MAC on input.
xferChars=233195|-Error, down to 116598 chars
        Received disconnect message: Corrupted MAC on input.
xferChars=116598|-Error, down to 58299 chars
        Received disconnect message: Corrupted MAC on input.
xferChars=58299|-Error, down to 29150 chars
        alarm handler 10 second timeout
xferChars=29150|-Error, down to 14575 chars
        alarm handler 10 second timeout
xferChars=14575|-Error, down to 7288 chars
        alarm handler 10 second timeout
xferChars=7288|-OK, back up to 10931 chars
xferChars=10931|-OK, back up to 12753 chars
xferChars=12753|-Error, down to 11842 chars
        alarm handler 10 second timeout
xferChars=11842|-Error, down to 11387 chars
        alarm handler 10 second timeout
xferChars=11387|-OK, back up to 11614 chars
xferChars=11614|-Error, down to 11501 chars
        alarm handler 10 second timeout
xferChars=11501|-OK, back up to 11557 chars
xferChars=11557|-Error, down to 11529 chars
        Received disconnect message: Corrupted MAC on input.
xferChars=11529|-OK, back up to 11543 chars
xferChars=11543|-Error, down to 11536 chars
        Received disconnect message: Corrupted MAC on input.
xferChars=11536|-Error, down to 11533 chars
        alarm handler 10 second timeout
xferChars=11533|-OK, back up to 11534 chars
xferChars=11534|-OK, back up to 11535 chars
xferChars=11535|-OK, back up to 11535 chars

Report for remote host: nwsgpb...
xferChars=11535 works, xferChars=11536 hangs on timeout=10.
exiting...

briandfoy commented 1 year ago

from @flanneljeans

This ticket needs to be re-opened, as I've encountered it again on version 1.34. I've verified both the bug, and the DoOneLoop fix, under windows (XP), MacOSX, and Solaris. However, the included detection script (as-is) does not catch the bug anymore. If you modify the high value to 98304 or below, it will catch the bug when you connect to a solaris box running SSH-2.0- Sun_SSH_1.1.3 (and possibly others).

briandfoy / net-ssh-perl

$shh("cat -> file",$data) hangs on large amount of data #3