Closed p5pRT closed 21 years ago
L.S.\,
We have encountered a very interesting problem on which you are really our last resort:
Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request.
We have tried almost everything within our power\, e.g.:
* compiling Perl on a working OS level and copying the binaries to the non-working OS level\, * compiling the current development version (5.005_61)\, * different GNU compilers (2.8.1 and 2.95.1)\, * SUN Workshop Compiler C/C++ 4.2\, * hacking in 'config.sh' (e.g. 'usevfork=false/true'\, multithreaded/non-multithreaded).
Nothing works out.
After issuing a bug report\, SUN responded with the following: \<\<FW: Bug ID# 4146098>> but this is to low-level for us to understand what's really going on. The troubling patch from SUN seems to be 105210-17 or above.
In more understandable language they claimed that older versions of Solaris had a bug\, which is fixed in newer releases and that Perl has probably been working around that bug. Now the bug is removed from the OS \, Perl is still working around\, but this time unsuccesfully.
Does this make sense to you ?
Can you help ???
P.S.: Below you can find all mail communications with SUN. If you need additional information\, please let us now.
Met vriendelijke groet/Kind regards\, Richard Hensgens ORIGIN B.V. - Managed Services - Distributed Systems Building VA-171\, E-Mail: Richard.Hensgens@nl.origin-it.com Phone: (+31:4027)87097\, Fax: (+31:4027)83962
The unix guru's view on sex: # unzip; strip; touch; finger; mount; fsck; more; yes; umount; sleep
-----Original Message----- From: Zuijdwijk\, Pieter Sent: Thursday\, September 30\, 1999 6:44 PM To: 'dispatch@holland.sun.com' Cc: Zuijdwijk\, Pieter; Hensgens\, Richard Subject: Call Nr. 6583771
Hereby the "truss -aef" output of 2 SUN systems running 2 different OS levels:
OK files: SunOS ... 5.6 Generic_105181-06 sun4u sparc SUNW\,Ultra-4 NOK files: SunOS ... 5.6 Generic_105181-15 sun4u sparc SUNW\,Ultra-Enterprise
\<\<Client.truss.out.NOK>> \<\<Client.truss.out.OK>> \<\<Client.pl>>
\<\<Server.pl>> \<\<Server.truss.out.NOK>> \<\<Server.truss.out.OK>> As you can see we have also problems on 5.6 Generic_105181-15 on Ultra-Enterprise 3000. Not a specific Solaris 7 issue after all.
Thanks in advance.
Pieter Zuijdwijk Origin TIS-DS-UNIX-SUN Groenewoudseweg 1 5621 BA Eindhoven\, The Netherlands Building VA-169 Phone +31 (0)40 27 89605 Fax +31 (0)40 27 89362
-----Original Message----- From: Hensgens\, Richard Sent: Tuesday\, September 28\, 1999 1:09 PM To: Zuijdwijk\, Pieter Subject: Bug Solaris 2.7
Pieter\,
Before we start downgrading the SUN box\, maybe first a bug report to SUN ?
Regular examples from the O'Reilly Perl books work differently on Solaris 2.6 and Solaris 2.7 with exactly the same Perl versions (5.005_03):
Server.pl: #!/usr/bin/perl
use IO::Socket;
$SIG{CHLD} = sub { wait() };
$Sock = new IO::Socket::INET( LocalPort => 9000\, Proto => 'tcp'\, Listen => SOMAXCONN\, Reuse => 1 ) or die "SOCKET() error [$!]";
while ( $NewSock = $Sock->accept() ) { $Pid = fork();
if ( $Pid == 0 ) { while ( defined( $Buffer = \<$NewSock> ) ) { print( $Buffer ); }
exit( 0 ); } }
close( $Sock );
exit( 0 );
Client.pl: #!/usr/bin/perl
use IO::Socket;
$Sock = new IO::Socket::INET( PeerAddr => 'tsesun01'\, PeerPort => 9000\, Proto => 'tcp' ) or die "SOCKET() error [$!]";
foreach ( 1..10 ) { print( $Sock "Msg $_: How are you ?\n" ); }
close( $Sock );
exit( 0 );
Output on Solaris 2.6:
nl1sahd1:root> ./Server.pl nl1sahd1:root> jobs [1] + Running ./Server.pl & nl1sahd1:root> ./Client.pl nl1sahd1:root> Msg 1: How are you ? Msg 2: How are you ? Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ?
nl1sahd1:root> jobs [1] + Running ./Server.pl &
Server serves as many requests as it should be.
Output on Solaris 2.7:
tsesun01:root> ./Server.pl & [1] 12331 tsesun01:root> jobs [1] + Running ./Server.pl & tsesun01:root> ./Client.pl Msg 1: How are you ? Msg 2: How are you ? tsesun01:root> Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ?
[1] + Done ./Server.pl & tsesun01:root> jobs
Server only serves one request and ends !!!!!
Message RFC822: Message-ID: 986AEA765305D311AA7B0008C75D97AFBB8A23@NLEHX020.origimail.origin-it.com From: "Hensgens, Richard" Richard.Hensgens@nl.origin-it.com To: "Hensgens, Richard" Richard.Hensgens@nl.origin-it.com Subject: FW: Bug ID# 4146098 Date: Mon, 4 Oct 1999 17:19:14 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1"
Bug Id: 4146098 Product: sunos Category: network Subcategory: socket Bug/Rfe/Eou: bug State: integrated Development Status: INT Synopsis: connect() and accept() can RESTART instead of returning EINTR Keywords: esc#514623 Severity: 2 Severity Impact: 1 Severity Functionality: 0 Priority: 2 Description: When SA_RESTART is passed to sigaction(), connect() and accept() restart instead of returning with errno EINTR.
CONNECT(2) SYSTEM CALLS CONNECT(2)
EINTR The connection attempt was interrupted
before any data arrived by the delivery
of a signal.
Sun Release 4.1 Last change: 21 January 1990 3
SVR4 example sigaction() is needed to set SA_RESTART.
c_test_sys5
This bug still seems to be present in 5.7.0@8221\, only on Solaris.
-spp
We have encountered a very interesting problem on which you are really our last resort:
Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request.
We have tried almost everything within our power\, e.g.:
* compiling Perl on a working OS level and copying the binaries to the non-working OS level\, * compiling the current development version (5.005_61)\, * different GNU compilers (2.8.1 and 2.95.1)\, * SUN Workshop Compiler C/C++ 4.2\, * hacking in 'config.sh' (e.g. 'usevfork=false/true'\, multithreaded/non-multithreaded).
Nothing works out.
After issuing a bug report\, SUN responded with the following: \<\<FW: Bug ID# 4146098>> but this is to low-level for us to understand what's really going on. The troubling patch from SUN seems to be 105210-17 or above.
In more understandable language they claimed that older versions of Solaris had a bug\, which is fixed in newer releases and that Perl has probably been working around that bug. Now the bug is removed from the OS \, Perl is still working around\, but this time unsuccesfully.
Server.pl: #!/usr/bin/perl
use IO::Socket;
$SIG{CHLD} = sub { wait() };
$Sock = new IO::Socket::INET( LocalPort => 9000\, Proto => 'tcp'\, Listen => SOMAXCONN\, Reuse => 1 ) or die "SOCKET() error [$!]";
while ( $NewSock = $Sock->accept() ) { $Pid = fork();
if ( $Pid == 0 ) { while ( defined( $Buffer = \<$NewSock> ) ) { print( $Buffer ); }
exit( 0 ); } }
close( $Sock );
exit( 0 );
Client.pl: #!/usr/bin/perl
use IO::Socket;
$Sock = new IO::Socket::INET( PeerAddr => 'tsesun01'\, PeerPort => 9000\, Proto => 'tcp' ) or die "SOCKET() error [$!]";
foreach ( 1..10 ) { print( $Sock "Msg $_: How are you ?\n" ); }
close( $Sock );
exit( 0 );
Output on Solaris 2.6:
nl1sahd1:root> ./Server.pl nl1sahd1:root> jobs [1] + Running ./Server.pl & nl1sahd1:root> ./Client.pl nl1sahd1:root> Msg 1: How are you ? Msg 2: How are you ? Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ?
nl1sahd1:root> jobs [1] + Running ./Server.pl &
Server serves as many requests as it should be.
Output on Solaris 2.7:
tsesun01:root> ./Server.pl & [1] 12331 tsesun01:root> jobs [1] + Running ./Server.pl & tsesun01:root> ./Client.pl Msg 1: How are you ? Msg 2: How are you ? tsesun01:root> Msg 3: How are you ? Msg 4: How are you ? Msg 5: How are you ? Msg 6: How are you ? Msg 7: How are you ? Msg 8: How are you ? Msg 9: How are you ? Msg 10: How are you ?
[1] + Done ./Server.pl & tsesun01:root> jobs
Server only serves one request and ends !!!!!
perhaps on solaris 2.7 a shutdown is being performed on the socket when the child closes. as an 'experiment/work around'\, try specifically close the listening socket in the child as per below.
"Stephen P. Potter" wrote:
Server.pl: #!/usr/bin/perl
use IO::Socket;
$SIG{CHLD} = sub { wait() };
$Sock = new IO::Socket::INET( LocalPort => 9000\, Proto => 'tcp'\, Listen => SOMAXCONN\, Reuse => 1 ) or die "SOCKET() error [$!]";
while ( $NewSock = $Sock->accept() ) { $Pid = fork();
if \( $Pid == 0 \) \{
close $Sock && $sockClosed=1;
while \( defined\( $Buffer = \<$NewSock> \) \) \{ print\( $Buffer \); \} exit\( 0 \); \}
}
close( $Sock );
close $Sock unless $sockClosed;
Lightning flashed\, thunder crashed and ___cliff rayman___ \cliff@​genwax\.com wh ispered: | perhaps on solaris 2.7 a shutdown is being performed on the socket when the | child closes. as an 'experiment/work around'\, try specifically close the lis
tening socket | in the child as per below.
I think the point being made in this report is that the script functions differently between Solaris versions. Sun claims to have fixed a bug\, that we may have been working around\, and that the work around may no longer be needed and may be causing the problem.
-spp
"Stephen P. Potter" wrote:
We have encountered a very interesting problem on which you are really our last resort:
Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request.
We have tried almost everything within our power\, e.g.:
* compiling Perl on a working OS level and copying the binaries to the non-working OS level\, * compiling the current development version (5.005_61)\, * different GNU compilers (2.8.1 and 2.95.1)\, * SUN Workshop Compiler C/C++ 4.2\, * hacking in 'config.sh' (e.g. 'usevfork=false/true'\, multithreaded/non-multithreaded).
Nothing works out.
Right - I've read the bugrep\, played with the example code and here is the story. Prior to the fix\, accept() and connect() were erroneously being restarted when a signal was caught. The correct behaviour according to the SVR4 spec is for them to return with EINTR\, even if SA_RESTART has been passed to sigaction().
The bugfix changed the behaviour so that if a signal was caught when either accept() or connect() are in progress they fail with EINTR instead of being restarted.
There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.
The quick and easy fix is to ignore SIGCHILD rather than catching it - this way no zombie child processes are created and no signals are generated to screw up the accept() call. Change the line $SIG{CHLD} = sub { wait() }; to $SIG{CHLD} = 'IGNORE'; And the script then works as expected.
Hope that helps\,
Alan Burlison Solaris Kernel Development\, Sun Microsystems
There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.
The quick and easy fix is to ignore SIGCHILD rather than catching it -
Can I still fix IO::Socket? :-)
this way no zombie child processes are created and no signals are generated to screw up the accept() call. Change the line $SIG{CHLD} = sub { wait() }; to $SIG{CHLD} = 'IGNORE'; And the script then works as expected.
Hope that helps\,
Alan Burlison Solaris Kernel Development\, Sun Microsystems
Jarkko Hietaniemi wrote:
There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.
The quick and easy fix is to ignore SIGCHILD rather than catching it -
Can I still fix IO::Socket? :-)
Hey\, you're the main man...
:-)
Actually I was surmising from the truss output that the problem was in IO::Socket. I had a quick look and it doesn't seem to be doing anything naughty. I've had a look at pp_sys.c as well\, and I can't see it there either. Hmmm\, wonder what is doing it?
Alan Burlison
On Fri\, Dec 22\, 2000 at 12:09:40AM +0000\, Alan Burlison wrote:
Jarkko Hietaniemi wrote:
There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.
The quick and easy fix is to ignore SIGCHILD rather than catching it -
Can I still fix IO::Socket? :-)
Hey\, you're the main man...
:-)
Actually I was surmising from the truss output that the problem was in IO::Socket. I had a quick look and it doesn't seem to be doing anything naughty. I've had a look at pp_sys.c as well\, and I can't see it there either. Hmmm\, wonder what is doing it?
It may be something along the lines that IO::Socket::accept creates a new object which gets destroyed when the method exits with an error. And during that destroy process various calls may be made I suppose.
If this is the case\, changing the return to something like the following may help
$peer = accept($new\,$sock) or do { local $!; undef $new; return };
Graham.
Lightning flashed\, thunder crashed and Alan Burlison \Alan\.Burlison@​uk\.sun\.com whispered: | There are two ways to fix the example script. The first is to redo the | accept() if EINTR is returned. The problem with this approach is that | the IO::Socket library doesn't check the return value of the accept() | call\, and then tries to do some I/O ops [llseek()] on the invalid file | handle. This then means that by the time your script can get hold of | errno it is set to EBADF instead of EINTR.
What I'm getting from all this is that there isn't a perceived bug in perl\, so I should go ahead and close the ticket. Is that correct? How do I explain that the script works as the user expects on other OSes (and earlier versions of Solaris)? A bug in those other OSes?
-spp
Alan Burlison \Alan\.Burlison@​uk\.sun\.com writes:
The bugfix changed the behaviour so that if a signal was caught when either accept() or connect() are in progress they fail with EINTR instead of being restarted.
There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO::Socket library doesn't check the return value of the accept() call\,
So we can consider this a bug in IO::Socket.
What I'm getting from all this is that there isn't a perceived bug in perl\, so I should go ahead and close the ticket. Is that correct? How do I explain that the script works as the user expects on other OSes (and earlier versions of Solaris)? A bug in those other OSes?
Correct - there is no bug in perl (well\, perhaps it should return EINTR instead of EBADF...) I've tried to track down exactly which standard mandates this behaviour\, but without a lot of success. Signals are one of the areas where different Unixes tend to differ wildly\, and this particular problem is a manefestation of those differences rather than a bug per se - the behaviour will depend on which standards a particular Unix is based on\, and how closely it adheres to those standards.
The sigaction manpage for Solaris says this:
SA_RESTART If set and the signal is caught\, functions that are interrupted by the execution of this signal's handler are transparently restarted by the system\, namely fcntl(2)\, ioctl(2)\, wait(2)\, waitid(2)\, and the following functions on slow dev- ices like terminals: getmsg() and getpmsg() (see getmsg(2)); putmsg() and putpmsg() (see putmsg(2)); pread()\, read()\, and readv() (see read(2)); pwrite()\, write()\, and writev() (see write(2)); recv()\, recvfrom()\, and recvmsg() (see recv(3SOCKET)); and send()\, sendto()\, and sendmsg() (see send(3SOCKET). Otherwise\, the function returns an EINTR error.
So in fact the behaviour seen is as documented on Solaris.
@gbarr - Status changed from 'open' to 'resolved'
Migrated from rt.perl.org#1564 (status was 'resolved')
Searchable as RT1564$