Perl / perl5

🐪 The Perl programming language
https://dev.perl.org/perl5/
Other
1.85k stars 527 forks source link

FW: Call Nr. 6583771 (!!! HELP !!!) #675

Closed p5pRT closed 21 years ago

p5pRT commented 24 years ago

Migrated from rt.perl.org#1564 (status was 'resolved')

Searchable as RT1564$

p5pRT commented 24 years ago

From Richard.Hensgens@nl.origin-it.com

L.S.\,

We have encountered a very interesting problem on which you are really our last resort​:

Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request.

We have tried almost everything within our power\, e.g.​:

* compiling Perl on a working OS level and copying the binaries to the non-working OS level\, * compiling the current development version (5.005_61)\, * different GNU compilers (2.8.1 and 2.95.1)\, * SUN Workshop Compiler C/C++ 4.2\, * hacking in 'config.sh' (e.g. 'usevfork=false/true'\, multithreaded/non-multithreaded).

Nothing works out.

After issuing a bug report\, SUN responded with the following​: \<\<FW​: Bug ID# 4146098>> but this is to low-level for us to understand what's really going on. The troubling patch from SUN seems to be 105210-17 or above.

In more understandable language they claimed that older versions of Solaris had a bug\, which is fixed in newer releases and that Perl has probably been working around that bug. Now the bug is removed from the OS \, Perl is still working around\, but this time unsuccesfully.

Does this make sense to you ?

Can you help ???

P.S.​: Below you can find all mail communications with SUN.   If you need additional information\, please let us now.

Met vriendelijke groet/Kind regards\, Richard Hensgens ORIGIN B.V. - Managed Services - Distributed Systems Building VA-171\, E-Mail​: Richard.Hensgens@​nl.origin-it.com Phone​: (+31​:4027)87097\, Fax​: (+31​:4027)83962

The unix guru's view on sex​: # unzip; strip; touch; finger; mount; fsck; more; yes; umount; sleep

-----Original Message----- From​: Zuijdwijk\, Pieter Sent​: Thursday\, September 30\, 1999 6​:44 PM To​: 'dispatch@​holland.sun.com' Cc​: Zuijdwijk\, Pieter; Hensgens\, Richard Subject​: Call Nr. 6583771

Hereby the "truss -aef" output of 2 SUN systems running 2 different OS levels​:

OK files​: SunOS ... 5.6 Generic_105181-06 sun4u sparc SUNW\,Ultra-4 NOK files​: SunOS ... 5.6 Generic_105181-15 sun4u sparc SUNW\,Ultra-Enterprise

\<\<Client.truss.out.NOK>> \<\<Client.truss.out.OK>> \<\<Client.pl>>
\<\<Server.pl>> \<\<Server.truss.out.NOK>> \<\<Server.truss.out.OK>> As you can see we have also problems on 5.6 Generic_105181-15 on Ultra-Enterprise 3000. Not a specific Solaris 7 issue after all.

Thanks in advance.

Pieter Zuijdwijk Origin TIS-DS-UNIX-SUN Groenewoudseweg 1 5621 BA Eindhoven\, The Netherlands Building VA-169 Phone +31 (0)40 27 89605 Fax +31 (0)40 27 89362

-----Original Message----- From​: Hensgens\, Richard Sent​: Tuesday\, September 28\, 1999 1​:09 PM To​: Zuijdwijk\, Pieter Subject​: Bug Solaris 2.7

Pieter\,

Before we start downgrading the SUN box\, maybe first a bug report to SUN ?

Regular examples from the O'Reilly Perl books work differently on Solaris 2.6 and Solaris 2.7 with exactly the same Perl versions (5.005_03)​:

Server.pl​: #!/usr/bin/perl

use IO​::Socket;

$SIG{CHLD} = sub { wait() };

$Sock = new IO​::Socket​::INET( LocalPort => 9000\, Proto => 'tcp'\, Listen => SOMAXCONN\, Reuse => 1 ) or die "SOCKET() error [$!]";

while ( $NewSock = $Sock->accept() ) {   $Pid = fork();

  if ( $Pid == 0 )   {   while ( defined( $Buffer = \<$NewSock> ) )   {   print( $Buffer );   }

  exit( 0 );   } }

close( $Sock );

exit( 0 );

Client.pl​: #!/usr/bin/perl

use IO​::Socket;

$Sock = new IO​::Socket​::INET( PeerAddr => 'tsesun01'\, PeerPort => 9000\, Proto => 'tcp' ) or die "SOCKET() error [$!]";

foreach ( 1..10 ) {   print( $Sock "Msg $_​: How are you ?\n" ); }

close( $Sock );

exit( 0 );

Output on Solaris 2.6​:

nl1sahd1​:root> ./Server.pl nl1sahd1​:root> jobs [1] + Running ./Server.pl & nl1sahd1​:root> ./Client.pl nl1sahd1​:root> Msg 1​: How are you ? Msg 2​: How are you ? Msg 3​: How are you ? Msg 4​: How are you ? Msg 5​: How are you ? Msg 6​: How are you ? Msg 7​: How are you ? Msg 8​: How are you ? Msg 9​: How are you ? Msg 10​: How are you ?

nl1sahd1​:root> jobs [1] + Running ./Server.pl &

Server serves as many requests as it should be.

Output on Solaris 2.7​:

tsesun01​:root> ./Server.pl & [1] 12331 tsesun01​:root> jobs [1] + Running ./Server.pl & tsesun01​:root> ./Client.pl Msg 1​: How are you ? Msg 2​: How are you ? tsesun01​:root> Msg 3​: How are you ? Msg 4​: How are you ? Msg 5​: How are you ? Msg 6​: How are you ? Msg 7​: How are you ? Msg 8​: How are you ? Msg 9​: How are you ? Msg 10​: How are you ?

[1] + Done ./Server.pl & tsesun01​:root> jobs

Server only serves one request and ends !!!!!

p5pRT commented 24 years ago

From Richard.Hensgens@nl.origin-it.com

Message RFC822: Message-ID: 986AEA765305D311AA7B0008C75D97AFBB8A23@NLEHX020.origimail.origin-it.com From: "Hensgens, Richard" Richard.Hensgens@nl.origin-it.com To: "Hensgens, Richard" Richard.Hensgens@nl.origin-it.com Subject: FW: Bug ID# 4146098 Date: Mon, 4 Oct 1999 17:19:14 +0200 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1"

Bug Id: 4146098 Product: sunos Category: network Subcategory: socket Bug/Rfe/Eou: bug State: integrated Development Status: INT Synopsis: connect() and accept() can RESTART instead of returning EINTR Keywords: esc#514623 Severity: 2 Severity Impact: 1 Severity Functionality: 0 Priority: 2 Description: When SA_RESTART is passed to sigaction(), connect() and accept() restart instead of returning with errno EINTR.

CONNECT(2) SYSTEM CALLS CONNECT(2)

EINTR               The connection attempt  was  interrupted
                     before  any data arrived by the delivery
                     of a signal.

Sun Release 4.1 Last change: 21 January 1990 3

============================================================================

SVR4 example sigaction() is needed to set SA_RESTART. c_test_sys5

is selected that will not respond to connect. kill -ALARM SVR4 example sigaction() is needed to set SA_RESTART. is sent to process while waiting. /* cc -o c_test_sys5 c_test_sys5.c -lsocket -lnsl Usage: c_test_sys5 */ #include #include #include #include #include #include void handler(sig) int sig; { printf("SIGNAL CATCHED\n"); } main(argc, argv) int argc; char **argv; { struct sockaddr_in ser; struct hostent *serhost; int sock; int n; char buf[256]; struct sigaction sa; if(argc != 3){ fprintf(stderr, "Usage: client \n"); exit(1); } sa.sa_flags = SA_RESTART; sa.sa_handler = handler; sigemptyset(&sa.sa_mask); sigaction(SIGALRM, &sa, NULL); serhost = gethostbyname(argv[1]); if(serhost == NULL){ fprintf(stderr, "bad hostname\n"); exit(1); } memset((char *)&ser, 0, sizeof(ser)); ser.sin_family = AF_INET; ser.sin_port = atoi(argv[2]); memcpy(&ser.sin_addr, serhost->h_addr, serhost->h_length); sock = socket(AF_INET, SOCK_STREAM, 0); if(sock == -1){ fprintf(stderr, "socket failed\n"); exit(1); } if(connect(sock, (struct sockaddr *)&ser, sizeof(ser)) == -1){ perror("CONNECT"); exit(1); } while(1){ n = read(sock, buf, sizeof(buf)); if(n == 0) break; if(n < 0){ fprintf(stderr, "file read error\n"); exit(1); } write(1, buf, n); } close(sock); exit(0); } Justification: This is the root cause of Escalation # 514623 bug# 4132657, Customer needs a patch for 5.6. Work around: Suggested fix: Diffs are shown below for sparc and x86 (diffs are identical for sparc and sparcv9). The entire set of files changed are: usr/src/lib/libc/i386/sys/_so_accept.s usr/src/lib/libc/i386/sys/_so_connect.s usr/src/lib/libc/sparc/sys/_so_accept.s usr/src/lib/libc/sparc/sys/_so_connect.s usr/src/lib/libc/sparcv9/sys/_so_accept.s usr/src/lib/libc/sparcv9/sys/_so_connect.s note _cerror maps() ERESTART to EINTR ####### usr/src/lib/libc/sparc/sys ###### % diff -c _so_connect.s.1.2 _so_connect.s *** _so_connect.s.1.2 Thu May 22 14:38:48 1997 --- _so_connect.s Fri Jun 5 08:28:00 1998 *************** *** 18,24 **** #include "SYS.h" ! SYSCALL2_RESTART(_so_connect,connect) RET SET_SIZE(_so_connect) --- 18,24 ---- #include "SYS.h" ! SYSCALL2(_so_connect,connect) RET SET_SIZE(_so_connect) % diff -c _so_accept.s.1.2 _so_accept.s *** _so_accept.s.1.2 Thu May 22 14:38:48 1997 --- _so_accept.s Fri Jun 5 08:27:12 1998 *************** *** 19,25 **** #include "SYS.h" ! SYSCALL2_RESTART(_so_accept,accept) RET SET_SIZE(_so_accept) --- 19,25 ---- #include "SYS.h" ! SYSCALL2(_so_accept,accept) RET SET_SIZE(_so_accept) sctesrv 54: ##################### usr/src/lib/libc/i386/sys % diff -c _so_connect.s.1.5 _so_connect.s *** _so_connect.s.1.5 Fri Jun 5 08:59:52 1998 --- _so_connect.s Fri Jun 5 09:01:48 1998 *************** *** 18,25 **** movl $CONNECT,%eax lcall $SYSCALL_TRAPNUM,$0 jae noerror - cmpb $ERESTART,%al - je _so_connect _prologue_ _m4_ifdef_(`DSHLIB', `pushl %eax', --- 18,23 ---- sctesrv 43: diff -c _so_accept.s.1.5 diff: two filename arguments required sctesrv 44: diff -c _so_accept.s.1.5 _so_accept.s *** _so_accept.s.1.5 Fri Jun 5 09:02:12 1998 --- _so_accept.s Fri Jun 5 09:02:53 1998 *************** *** 18,25 **** movl $ACCEPT,%eax lcall $SYSCALL_TRAPNUM,$0 jae noerror - cmpb $ERESTART,%al - je _so_accept _prologue_ _m4_ifdef_(`DSHLIB', `pushl %eax', --- 18,23 ---- State triggers: Accepted: yes Evaluated: yes Evaluation: 4132657 covers the binary compatibility problem. When the sample program from 4132657 is compiled and tested on 5.6, the result is: $ /ws/on297-tools/SUNWspro/SC4.2/bin/cc x.c -lsocket -lnsl $ ./x fade 15000 & 18519 $ kill -ALRM 18519 $ SIGNAL CATCHED CONNECT: Interrupted system call $ wait That is, the behavior is correct. Thus the only problem appears to be the BCP one and this one isn't reproducible. ================================= updated description with reporducable example 1998-06-16 ================================= 1998-07-20 ---------------------------- I thought this fix was being done as part of the escalation process (4132567 and this are essentially the same bug for pre-kernel socket and post-kernel socket source bases...not sure why they got split into two bugs. The fixes are different because of different sources, bugs are not). Will try to fix and test this. The code in Suggested Fix should work. 1998-07-23 ---------------------------- My guess is that this bug got split into 2.5.1 and 2.6-and-later versions since this might be not-quite-easily fixable for 2.5.1 since that would involve changing the restartable nature of getmsg()/putmsg() system calls. The fix here is to make the system calls underlying calls for connect() and accept() interfaces NOT restartable as they currently (and erroneously) are. This makes the behavior compatible to SunOS4.x and also fixes it for SunOS5.x [ The BCP interfaces are implemented using the native OS interfaces, a BCP program just happens to have uncovered this bug ]. The emails in the "Comments" section further clairfy some of the technical background behind this fix. The program in the description section tests only the connect() interface. Test programs with slight modifications were used to test both the connect() and accept() interfaces and those test programs have been added to the attachments. WITHOUT THE FIX, the observed behavior is as follows with output slightly edited for clarity: === % ./accept_test 1234 & [1] 668 Process id is 668 % truss -v all -p 668 accept(3, 0xEFFFF9D4, 0xEFFFF9C0, 1) (sleeping...) ^C% kill -ALRM 668 SIGNAL CATCHED % truss -v all -p 668 accept(3, 0xEFFFF9D4, 0xEFFFF9C0, 1) (sleeping...) Thus accept() call is restarted and continues sleeping. % ./connect_test bobo 1234 & [1] 671 Process id is 671 % kill -ALRM 671 % SIGNAL CATCHED CONNECT: Operation already in progress [1] Exit 1 ./connect_test bobo 1234 The connect() call is restarted and fails with EINPROGRESS ==== WITH THE FIX, the observed behavior is as follows with output slightly edited for clarity: === % ./accept_test 1234 & [1] 4523 Process id is 4523 % kill -ALRM 4523 % SIGNAL CATCHED ACCEPT: Interrupted system call [1] Exit 1 ./accept_test 1234 The accept() call now fails with EINTR even when SA_RESTART is set. % ./connect_test bobo 1234 & [1] 4525 Process id is 4525 % kill -ALRM 4525 % SIGNAL CATCHED CONNECT: Interrupted system call [1] Exit 1 ./connect_test bobo 1234 The connect() call now fails with EINTR even when SA_RESTART is set. ==== Commit to fix in releases: generic, s998_20 Fixed in releases: s998_20 Integrated in releases: s998_20 Verified in releases: Closed because: Incomplete because: Duplicate of: Introduced in Release: Root cause: Program management: Fix affects documentation: no Exempt from dev rel: no Fix affects L10N: no Patch id: Comments: ============================== added sys5 example to description and reopened bug. 1998-06-16 ============================== 1998-07-20 ------------------------------ An archive of two emails which are part of discussions relevant to this bug which also point to a man page deficiency. ========= > > Roger, > > Jim seems to claim that all system calls except connect() were automatically > restarted after a signal in SunOS 4.X. Is this really true i.e. did 4.X > have different restart semantics for different system calls? > (I figured asking you would be quicker than reading the 4.x source.) > > My assumption is that in 5.X SA_RESTART should/must apply to all > interruptible system calls i.e. that we should not treat connect() > differently. Correct? > > Note that connect() is odd because it can fail with EINTR/ERESTART after > having started the connect attempt. Thus when connect() is restarted > the 2nd one might fail with EALREADY or EISCONN even though connect > was sucessful. I don't know of any other syscalls that modify "state" > before returning EINTR. 4.x never restarted anything other than what 5.x does with SA_RESTART passed to sigaction(). SA_RESTART does not mean that all interruptible system calls are restarted. Only a subset. This is what the man page for sigaction(2) says. This is also true of 4.x: SA_RESTART If set and the signal is caught, certain functions that are interrupted by the execution of this signal's handler are transparently restarted by the system; namely, read(2) or write(2) on slow dev- ices like terminals, ioctl(2), fcntl(2), wait(2), and waitid(2). Otherwise, that function returns an EINTR error. Roger ==================== MIME-Version: 1.0 Thanks for the info. > 4.x never restarted anything other than what 5.x does with > SA_RESTART passed to sigaction(). SA_RESTART does not mean > that all interruptible system calls are restarted. Only a subset. > This is what the man page for sigaction(2) says. > This is also true of 4.x: > > SA_RESTART If set and the signal is caught, certain > functions that are interrupted by the > execution of this signal's handler are > transparently restarted by the system; > namely, read(2) or write(2) on slow dev- > ices like terminals, ioctl(2), fcntl(2), > wait(2), and waitid(2). Otherwise, that > function returns an EINTR error. The above man page doesn't take sockets into account. In 4.X the source code tells me that restart also applies to send, sendto, sendmsg, recv, recvmsg, recvfrom. Thus we clearly need to fix the man page to say that for SA_RESTART. But what about getmsg and putmsg on slow devices? Shouldn't they get the same treatment as read/write/send*/recv*? The SunOS 5.6 source code shows the following ERESTARTs: fcntl getmsg getpmsg NOT in man page putmsg putpmsg NOT in man page ioctl read pread readv NOT in man page write pwrite writev NOT in man page wait waitid connect accept THIS is a bug recv recvfrom recvmsg NOT in man page send sendto sendmsg NOT in man page Thus the man page had it right 6 out of 14 prior to kernel sockets and 6 out of 20 with kernel sockets!!! Tim, assuming Roger doesn't have an issue with documenting all 20, can you file a man page bug to have the 14 missing calls added to the SA_RESTART decription. Also, fix the connect and accept wrappers in libc (sparc and x86) to not use the restart macro/code. That will fix this "BCP problem". Erik ================ See also: 4132657 History: Submitter: wadej Date: Jun 5 1998 10:02AM Dispatch operator: bugtraq Date: Jun 5 1998 10:02AM Acceptor: cs Date: Jun 11 1998 1:19PM Evaluator: cs Date: Jun 11 1998 1:19PM Commit operator: mukesh Date: Jul 27 1998 5:14PM Fix operator: mukesh Date: Jul 27 1998 5:14PM Integrating operator: bmc Date: Jul 28 1998 12:24PM Verify operator: Date: Closeout operator: Date: Called in by:
p5pRT commented 24 years ago

From Richard.Hensgens@nl.origin-it.com

Client.truss.out.NOK

p5pRT commented 24 years ago

From Richard.Hensgens@nl.origin-it.com

Client.truss.out.OK

p5pRT commented 24 years ago

From Richard.Hensgens@nl.origin-it.com

Client.pl

p5pRT commented 24 years ago

From Richard.Hensgens@nl.origin-it.com

Server.pl

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

This bug still seems to be present in 5.7.0@​8221\, only on Solaris.

-spp

We have encountered a very interesting problem on which you are really our last resort​:

Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request.

We have tried almost everything within our power\, e.g.​:

* compiling Perl on a working OS level and copying the binaries to   the non-working OS level\, * compiling the current development version (5.005_61)\, * different GNU compilers (2.8.1 and 2.95.1)\, * SUN Workshop Compiler C/C++ 4.2\, * hacking in 'config.sh' (e.g. 'usevfork=false/true'\,   multithreaded/non-multithreaded).

Nothing works out.

After issuing a bug report\, SUN responded with the following​:   \<\<FW​: Bug ID# 4146098>> but this is to low-level for us to understand what's really going on. The troubling patch from SUN seems to be 105210-17 or above.

In more understandable language they claimed that older versions of Solaris had a bug\, which is fixed in newer releases and that Perl has probably been working around that bug. Now the bug is removed from the OS \, Perl is still working around\, but this time unsuccesfully.

Server.pl​: #!/usr/bin/perl

use IO​::Socket;

$SIG{CHLD} = sub { wait() };

$Sock = new IO​::Socket​::INET( LocalPort => 9000\, Proto => 'tcp'\, Listen => SOMAXCONN\, Reuse => 1 ) or die "SOCKET() error [$!]";

while ( $NewSock = $Sock->accept() ) {   $Pid = fork();

  if ( $Pid == 0 )   {   while ( defined( $Buffer = \<$NewSock> ) )   {   print( $Buffer );   }

  exit( 0 );   } }

close( $Sock );

exit( 0 );

Client.pl​: #!/usr/bin/perl

use IO​::Socket;

$Sock = new IO​::Socket​::INET( PeerAddr => 'tsesun01'\, PeerPort => 9000\, Proto => 'tcp' ) or die "SOCKET() error [$!]";

foreach ( 1..10 ) {   print( $Sock "Msg $_​: How are you ?\n" ); }

close( $Sock );

exit( 0 );

Output on Solaris 2.6​:

nl1sahd1​:root> ./Server.pl nl1sahd1​:root> jobs [1] + Running ./Server.pl & nl1sahd1​:root> ./Client.pl nl1sahd1​:root> Msg 1​: How are you ? Msg 2​: How are you ? Msg 3​: How are you ? Msg 4​: How are you ? Msg 5​: How are you ? Msg 6​: How are you ? Msg 7​: How are you ? Msg 8​: How are you ? Msg 9​: How are you ? Msg 10​: How are you ?

nl1sahd1​:root> jobs [1] + Running ./Server.pl &

Server serves as many requests as it should be.

Output on Solaris 2.7​:

tsesun01​:root> ./Server.pl & [1] 12331 tsesun01​:root> jobs [1] + Running ./Server.pl & tsesun01​:root> ./Client.pl Msg 1​: How are you ? Msg 2​: How are you ? tsesun01​:root> Msg 3​: How are you ? Msg 4​: How are you ? Msg 5​: How are you ? Msg 6​: How are you ? Msg 7​: How are you ? Msg 8​: How are you ? Msg 9​: How are you ? Msg 10​: How are you ?

[1] + Done ./Server.pl & tsesun01​:root> jobs

Server only serves one request and ends !!!!!

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

perhaps on solaris 2.7 a shutdown is being performed on the socket when the child closes. as an 'experiment/work around'\, try specifically close the listening socket in the child as per below.

"Stephen P. Potter" wrote​:

Server.pl​: #!/usr/bin/perl

use IO​::Socket;

$SIG{CHLD} = sub { wait() };

$Sock = new IO​::Socket​::INET( LocalPort => 9000\, Proto => 'tcp'\, Listen => SOMAXCONN\, Reuse => 1 ) or die "SOCKET() error [$!]";

while ( $NewSock = $Sock->accept() ) { $Pid = fork();

     if \( $Pid == 0 \)
     \{

  close $Sock && $sockClosed=1;

             while \( defined\( $Buffer = \<$NewSock> \) \)
             \{
                     print\( $Buffer \);
             \}

             exit\( 0 \);
     \}

}

close( $Sock );

  close $Sock unless $sockClosed;

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Lightning flashed\, thunder crashed and ___cliff rayman___ \cliff@&#8203;genwax\.com wh ispered​: | perhaps on solaris 2.7 a shutdown is being performed on the socket when the | child closes. as an 'experiment/work around'\, try specifically close the lis

tening socket | in the child as per below.

I think the point being made in this report is that the script functions differently between Solaris versions. Sun claims to have fixed a bug\, that we may have been working around\, and that the work around may no longer be needed and may be causing the problem.

-spp

p5pRT commented 23 years ago

From @AlanBurlison

"Stephen P. Potter" wrote​:

We have encountered a very interesting problem on which you are really our last resort​:

Exiting a child in a forking server (example on page 194 of 'Advanced Perl Programming' O'Reilly) seems to clean-up the server socket of the parent on newer levels of Solaris. The parent exits with a 'EBADF (Bad file number)' after having served one client request.

We have tried almost everything within our power\, e.g.​:

* compiling Perl on a working OS level and copying the binaries to the non-working OS level\, * compiling the current development version (5.005_61)\, * different GNU compilers (2.8.1 and 2.95.1)\, * SUN Workshop Compiler C/C++ 4.2\, * hacking in 'config.sh' (e.g. 'usevfork=false/true'\, multithreaded/non-multithreaded).

Nothing works out.

Right - I've read the bugrep\, played with the example code and here is the story. Prior to the fix\, accept() and connect() were erroneously being restarted when a signal was caught. The correct behaviour according to the SVR4 spec is for them to return with EINTR\, even if SA_RESTART has been passed to sigaction().

The bugfix changed the behaviour so that if a signal was caught when either accept() or connect() are in progress they fail with EINTR instead of being restarted.

There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO​::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.

The quick and easy fix is to ignore SIGCHILD rather than catching it - this way no zombie child processes are created and no signals are generated to screw up the accept() call. Change the line   $SIG{CHLD} = sub { wait() }; to   $SIG{CHLD} = 'IGNORE'; And the script then works as expected.

Hope that helps\,

Alan Burlison Solaris Kernel Development\, Sun Microsystems

p5pRT commented 23 years ago

From @jhi

There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO​::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.

The quick and easy fix is to ignore SIGCHILD rather than catching it -

Can I still fix IO​::Socket? :-)

this way no zombie child processes are created and no signals are generated to screw up the accept() call. Change the line $SIG{CHLD} = sub { wait() }; to $SIG{CHLD} = 'IGNORE'; And the script then works as expected.

Hope that helps\,

Alan Burlison Solaris Kernel Development\, Sun Microsystems

p5pRT commented 23 years ago

From @AlanBurlison

Jarkko Hietaniemi wrote​:

There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO​::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.

The quick and easy fix is to ignore SIGCHILD rather than catching it -

Can I still fix IO​::Socket? :-)

Hey\, you're the main man...

:-)

Actually I was surmising from the truss output that the problem was in IO​::Socket. I had a quick look and it doesn't seem to be doing anything naughty. I've had a look at pp_sys.c as well\, and I can't see it there either. Hmmm\, wonder what is doing it?

Alan Burlison

p5pRT commented 23 years ago

From @gbarr

On Fri\, Dec 22\, 2000 at 12​:09​:40AM +0000\, Alan Burlison wrote​:

Jarkko Hietaniemi wrote​:

There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO​::Socket library doesn't check the return value of the accept() call\, and then tries to do some I/O ops [llseek()] on the invalid file handle. This then means that by the time your script can get hold of errno it is set to EBADF instead of EINTR.

The quick and easy fix is to ignore SIGCHILD rather than catching it -

Can I still fix IO​::Socket? :-)

Hey\, you're the main man...

:-)

Actually I was surmising from the truss output that the problem was in IO​::Socket. I had a quick look and it doesn't seem to be doing anything naughty. I've had a look at pp_sys.c as well\, and I can't see it there either. Hmmm\, wonder what is doing it?

It may be something along the lines that IO​::Socket​::accept creates a new object which gets destroyed when the method exits with an error. And during that destroy process various calls may be made I suppose.

If this is the case\, changing the return to something like the following may help

  $peer = accept($new\,$sock)   or do { local $!; undef $new; return };

Graham.

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Lightning flashed\, thunder crashed and Alan Burlison \Alan\.Burlison@&#8203;uk\.sun\.com whispered​: | There are two ways to fix the example script. The first is to redo the | accept() if EINTR is returned. The problem with this approach is that | the IO​::Socket library doesn't check the return value of the accept() | call\, and then tries to do some I/O ops [llseek()] on the invalid file | handle. This then means that by the time your script can get hold of | errno it is set to EBADF instead of EINTR.

What I'm getting from all this is that there isn't a perceived bug in perl\, so I should go ahead and close the ticket. Is that correct? How do I explain that the script works as the user expects on other OSes (and earlier versions of Solaris)? A bug in those other OSes?

-spp

p5pRT commented 23 years ago

From [Unknown Contact. See original ticket]

Alan Burlison \Alan\.Burlison@&#8203;uk\.sun\.com writes​:

The bugfix changed the behaviour so that if a signal was caught when either accept() or connect() are in progress they fail with EINTR instead of being restarted.

There are two ways to fix the example script. The first is to redo the accept() if EINTR is returned. The problem with this approach is that the IO​::Socket library doesn't check the return value of the accept() call\,

So we can consider this a bug in IO​::Socket.

p5pRT commented 23 years ago

From @AlanBurlison

What I'm getting from all this is that there isn't a perceived bug in perl\, so I should go ahead and close the ticket. Is that correct? How do I explain that the script works as the user expects on other OSes (and earlier versions of Solaris)? A bug in those other OSes?

Correct - there is no bug in perl (well\, perhaps it should return EINTR instead of EBADF...) I've tried to track down exactly which standard mandates this behaviour\, but without a lot of success. Signals are one of the areas where different Unixes tend to differ wildly\, and this particular problem is a manefestation of those differences rather than a bug per se - the behaviour will depend on which standards a particular Unix is based on\, and how closely it adheres to those standards.

The sigaction manpage for Solaris says this​:

  SA_RESTART   If set and the signal is caught\, functions that are   interrupted by the execution of this signal's handler   are transparently restarted by the system\, namely   fcntl(2)\, ioctl(2)\, wait(2)\,   waitid(2)\, and the following functions on slow dev-   ices like terminals​: getmsg() and getpmsg() (see   getmsg(2)); putmsg() and putpmsg() (see putmsg(2));   pread()\, read()\, and readv() (see read(2)); pwrite()\,   write()\, and writev() (see write(2)); recv()\,   recvfrom()\, and recvmsg() (see recv(3SOCKET)); and   send()\, sendto()\, and sendmsg() (see send(3SOCKET).   Otherwise\, the function returns an EINTR error.

So in fact the behaviour seen is as documented on Solaris.

p5pRT commented 21 years ago

@gbarr - Status changed from 'open' to 'resolved'