IO::Socket async connect() logic is broken on Solaris

p5pRT commented 12 years ago

Migrated from rt.perl.org#107962 (status was 'open')

Searchable as RT107962$

p5pRT commented 12 years ago

From carson@taltos.org

Created by carson@taltos.org

The async connect() logic in IO::Socket is broken for Solaris (and almost certainly other OS variants). It assumes you're allowed to call connect() a second time\, which is not at all guaranteed. Solaris correctly returns EINVAL when IO::Socket does so (as does Windows\, but they spell it WSAEINVAL\, and there is code that handles that already).

There was a recent commit to change the select() logic\, but it's still wrong.

Below is some very lightly tested code that tries to do the right thing on all platforms.

sub connect { @_ == 2 or croak 'usage: $sock->connect(NAME)'; my $sock = shift; my $addr = shift; my $timeout = ${*$sock}{'io_socket_timeout'}; my $err; my $blocking;

$blocking = $sock->blocking(0) if $timeout; if (!connect($sock\, $addr)) { if (defined $timeout && ($!{EINPROGRESS} || $!{EWOULDBLOCK})) { require IO::Select;

my $sel = new IO::Select $sock;

undef $!; my($r\,$w\,$e) = IO::Select::select($sel\,$sel\,$sel\,$timeout); if (!defined($r) && !defined($w) && !defined($e)) { # select returns an empty list on timeout or other error $err = $!; if ($err == 0) { # timeout $err = (exists &Errno::ETIMEDOUT ? &Errno::ETIMEDOUT : 1); $@ = "connect: timeout"; } } elsif(@$w[0] || @$r[0]) { # select() returns writable when connect() completes # select() returns readable if there's an error # if both\, we either received data very quickly\, or it's # an odd OS specific error behaviour. if (@$r[0]) { if (exists(&Socket::SO_ERROR)) { # The best way to get the error state\, if supported $err = $sock->getsockopt(SOL_SOCKET\,SO_ERROR); if ($err == 0) { undef $err; } } else { # If SO_ERROR isn't supported\, the options to get # the error code are all bad. sysread($fd\, $junk\, 0) # works on some systems. Calling connect() again # works on others. But neither is reliable in # portable code. The only portable option is # getpeername()\, which returns a generic error. if (! getpeername($sock)) { $err = $!; } } } } elsif(@$e[0]) { # We really shouldn't ever get here. Exception means # OOB data is waiting on most UNIX platforms\, and # should never be set without read or write. # Windows return from select after the timeout in case of # WSAECONNREFUSED(10061) if exception set is not used. # This behavior is different from Linux. # Using the exception # set we now emulate the behavior in Linux # - Karthik Rajagopalan $err = $sock->getsockopt(SOL_SOCKET\,SO_ERROR); $@ = "connect: $err"; } } elsif ($blocking || !($!{EINPROGRESS} || $!{EWOULDBLOCK})) { $err = $!; $@ = "connect: $!"; } }

$sock->blocking(1) if $blocking;

$! = $err if $err;

$err ? undef : $sock; }

Perl Info

``` Flags: category=library severity=high module=IO::Socket Site configuration information for perl 5.14.2: Configured by carson at Tue Jan 10 19:32:20 PST 2012. Summary of my perl5 (revision 5 version 14 subversion 2) configuration: Platform: osname=solaris, osvers=2.11, archname=i86pc-solaris-64 uname='sunos gandalf.local.taltos.org 5.11 11.0 i86pc i386 i86pc ' config_args='-Dprefix=/Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2 -Doptimize=-xO5 -xchip=sandybridge -xarch=sse4_2 -xarch=avx -xarch=aes -Dcf_email=carson@taltos.org -Dinstallsitelib=/var/tmp/site_perl/5.14.2 -Dotherlibdirs=/usr/local/lib/site_perl/5.14.2:/usr/local/lib/site_perl/5.14 -Dccflags=-m64 -d' hint=recommended, useposix=true, d_sigaction=define useithreads=undef, usemultiplicity=undef useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef use64bitint=define, use64bitall=define, uselongdouble=undef usemymalloc=n, bincompat5005=undef Compiler: cc='cc', ccflags ='-m64 -I/usr/local/include -m64 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -DPERL_USE_SAFE_PUTENV', optimize='-xO5 -xchip=sandybridge -xarch=sse4_2 -xarch=avx -xarch=aes', cppflags='-m64 -I/usr/local/include' ccversion='Sun C 5.12 SunOS_i386 Spica 2011/07/21', gccversion='', gccosandvers='' intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678 d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8 alignbytes=8, prototype=define Linker and Libraries: ld='cc', ldflags =' -L/usr/lib -L/usr/ccs/lib -L/opt/solstudiodev/prod/lib -L/lib -L/usr/local/lib -L/usr/gnu/lib -m64 ' libpth=/usr/lib /usr/ccs/lib /opt/solstudiodev/prod/lib /lib /usr/local/lib /usr/gnu/lib libs=-lsocket -lnsl -lgdbm -ldb -ldl -lm -lc perllibs=-lsocket -lnsl -ldl -lm -lc libc=/lib/libc.so, so=so, useshrplib=false, libperl=libperl.a gnulibc_version='' Dynamic Linking: dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' ' cccdlflags='-KPIC', lddlflags=' -G -m64 -L/usr/lib -L/usr/ccs/lib -L/opt/solstudiodev/prod/lib -L/lib -L/usr/local/lib -L/usr/gnu/lib' Locally applied patches: @INC for perl 5.14.2: /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/site_perl/5.14.2/i86pc-solaris-64 /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/site_perl/5.14.2 /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/5.14.2/i86pc-solaris-64 /Tools/SunOS_5.11_i86pc_amd64/perl-5.14.2/lib/5.14.2 /usr/local/lib/site_perl/5.14.2/i86pc-solaris-64 /usr/local/lib/site_perl/5.14.2 /usr/local/lib/site_perl/5.14 . Environment for perl 5.14.2: HOME=/home/carson LANG=en_US.UTF-8 LANGUAGE (unset) LC_ALL= LC_COLLATE= LC_CTYPE= LC_MESSAGES= LC_MONETARY= LC_NUMERIC= LC_TIME= LD_LIBRARY_PATH (unset) LOGDIR (unset) PATH=/tools/perl/bin:/tools/python/bin:/opt/solstudiodev/bin:/usr/ccs/bin:/bin:/usr/bin:/sbin:/usr/sbin:/usr/openwin/bin:/usr/dt/bin:/usr/local/bin PERL_BADLANG (unset) SHELL=/bin/bash ```

p5pRT commented 12 years ago

From @jkeenan

On Wed Jan 11 03:39:04 2012\, carson@taltos.org wrote:

This is a bug report for perl from carson@taltos.org\, generated with the help of perlbug 1.39 running under perl 5.14.2.

----------------------------------------------------------------- [Please describe your issue here] The async connect() logic in IO::Socket is broken for Solaris (and almost certainly other OS variants). It assumes you're allowed to call connect() a second time\, which is not at all guaranteed. Solaris correctly returns EINVAL when IO::Socket does so (as does Windows\, but they spell it WSAEINVAL\, and there is code that handles that already).

There was a recent commit to change the select() logic\, but it's still wrong.

Below is some very lightly tested code that tries to do the right thing on all platforms.

Thank you for the report. Two requests: Can you provide an example of how your subroutine should be called? And what do you mean by "tries to do the right thing"\, i.e.\, how will I know whether it's "working" on the platforms I have available?

Thank you very much. Jim Keenan

p5pRT commented 12 years ago

The RT System itself - Status changed from 'new' to 'open'

p5pRT commented 12 years ago

From carson@taltos.org

On 1/11/12 4:33 PM\, James E Keenan via RT wrote:

Thank you for the report. Two requests: Can you provide an example of how your subroutine should be called? And what do you mean by "tries to do the right thing"\, i.e.\, how will I know whether it's "working" on the platforms I have available?

This is a replacement for the existing connect method in IO/Socket.pm. I do _not_ suggest it be checked in as-is. It's a 3am "why is this !@#$% code not working!" rough hack that Works for Me(tm). It needs test cases\, and especially testing on Windows (I'm confident I understand the UNIX semantics).

And "tries to do the right thing" means: - It will always succeed or fail correctly (no more false negatives) - It will return the correct error from an async connect() IFF the platform supports SO_ERROR - If the platform does not support SO_ERROR\, it will return a generic error (whatever getpeername() returns on that platform). It does not attempt the 2 non-portable ways to try and get the correct error code (zero byte read() and calling connect() again)\, as their behaviour is unpredictable.

This whole mess is caused by async connect() not having well defined semantics. When I implemented support for async TCP socket operations in the NEC SOCKS5 code in the 90s (to get squid to work)\, I asked W. Richard Stevens about the corner cases\, and he responded "that's a good question\, let me know what you find out". And I knew I was in trouble ;-)

-- Carson

p5pRT commented 12 years ago

From @nwc10

On Wed\, Jan 11\, 2012 at 04:33:51PM -0800\, James E Keenan via RT wrote:

On Wed Jan 11 03:39:04 2012\, carson@taltos.org wrote:

This is a bug report for perl from carson@taltos.org\, generated with the help of perlbug 1.39 running under perl 5.14.2.

----------------------------------------------------------------- [Please describe your issue here] The async connect() logic in IO::Socket is broken for Solaris (and almost certainly other OS variants). It assumes you're allowed to call connect() a second time\, which is not at all guaranteed. Solaris correctly returns EINVAL when IO::Socket does so (as does Windows\, but they spell it WSAEINVAL\, and there is code that handles that already).

There was a recent commit to change the select() logic\, but it's still wrong.

Below is some very lightly tested code that tries to do the right thing on all platforms.

Thank you for the report. Two requests: Can you provide an example of how your subroutine should be called? And what do you mean by "tries to do the right thing"\, i.e.\, how will I know whether it's "working" on the platforms I have available?

By "tries to do the right thing" he means "not call connect a second time".

I'm not sure if we have any regression tests to check non-blocking connect. It's also rather hard to check - it's all going to be timing related. For example\, (and I was working with software that was bitten by this):

Solaris 2.7 refactored the IP stack to be multithreaded. 2.6 was not\, hence non-blocking connect to a socket on localhost would always return EWOULDBLOCK. This code was written to assume this\, and so failed on 2.7\, where the connect would return 0 (for success)\, because the new multithreaded IP stack *was* able to complete the connect fast enough.

I suspect that to really test the logic here\, one would have to mock the socket functions such as connect(). Which is do-able at a language level\, but is going to get messy because it won't test platform specific differences\, such as the EINVAL vs WSAEINVAL codes which the bug reporter mentions (and I wasn't fully aware of).

I think in summary\, if it's not currently "broken" on the platforms you have access to\, all you can verify is that it remains working.

Nicholas Clark

p5pRT commented 12 years ago

From @leonerd

On Thu\, Jan 12\, 2012 at 08:35:54AM +0000\, Nicholas Clark wrote:

I suspect that to really test the logic here\, one would have to mock the socket functions such as connect(). Which is do-able at a language level\, but is going to get messy because it won't test platform specific differences\, such as the EINVAL vs WSAEINVAL codes which the bug reporter mentions (and I wasn't fully aware of).

I think in summary\, if it's not currently "broken" on the platforms you have access to\, all you can verify is that it remains working.

If it's any help\, I observe that IO::Socket::IP manages to do blocking and non-blocking connects OK on both Solaris and MSWin32 in particular:

http://matrix.cpantesters.org/?dist=IO-Socket-IP%200.08;reports=1;os=solaris (no MSWin32 smoke results on 0.08\, but 0.07_003 worked fine)

Admittedly some of the logic in IO::Socket::IP has to end-run around IO::Socket's methods\, to get the right behavioural semantics out of it. Most notably:

# It seems that IO::Socket hides EINPROGRESS errors\, making them look # like a success. This is annoying here. # Instead of putting up with its frankly-irritating intentional # breakage of useful APIs I'm just going to end-run around it and # call CORE::connect() directly if( CORE::connect( $self\, $addr ) ) { $! = 0; return 1; }

-- Paul "LeoNerd" Evans

leonerd@leonerd.org.uk ICQ# 4135350 | Registered Linux# 179460 http://www.leonerd.org.uk/

Perl / perl5