att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
559 stars 154 forks source link

Crash when attempting to open a TCP socket. #1438

Open 0branch opened 5 years ago

0branch commented 5 years ago

Description of problem: ksh crashes when attempting to establish a TCP connection with exec under certain situations.

Ksh version: Version A 2020.0.0-beta1-166-g1025e369.

How reproducible: Consistently, for certain inputs.

Steps to reproduce:

  1. Start an interactive ksh session
  2. exec 3<>/dev/tcp/ntp-b.nist.gov/13
  3. Observe crash

Actual results:

$ : print $KSH_VERSION; exec 3<>/dev/tcp/ntp-b.nist.gov/13
Version A 2020.0.0-beta1-166-g1025e369
Killed: 9

Expected results:

$ print $KSH_VERSION; exec 3<>/dev/tcp/ntp-b.nist.gov/13
Version AJM 93u+ 2012-08-01
$

Additional info: Compare the above with the following, which seems to work as intended (at least, there's no immediate crash):

$ exec 3<>/dev/tcp/irc.freenode.net/6667

So, not all host/port pairs are triggering the problem. UDP not tested.

krader1961 commented 5 years ago

Please remember to include the platform on which you are observing the issue :smile: I can reproduce this on macOS but not Linux.

In this case the issue is that fd 3 is special on macOS due to something known as a "NECP descriptor": http://newosxbook.com/bonus/vol1ch16.html. You can see this by inserting a run_lsof(); statement between these two statements (the close(fd) is causing the OS to terminate the process with a SIGKILL):

https://github.com/att/ast/blob/ffbfaf6ff4fd0cb37505b78b02cb1af9eac04eb2/src/cmd/ksh93/sh/io.c#L467-L468

The same failure occurs with macOS /bin/ksh (ksh93u+) and ksh93v-. Thus I'm marking this as an enhancement for improved compatibility with macOS rather than a bug since, at this time, there isn't an obvious bug in the usual sense.

0branch commented 5 years ago

Apologies for omitting platform information; yes, macOS (10.14.6).

The failure isn't the same for me under /bin/ksh (referenced in the original issue, perhaps it wasn't clear that this was output from an actual session); indeed, everything works as expected. With 93u+ I can successfully open fd 3 for r/w and read data from NIST:

$ exec 3<>/dev/tcp/ntp-b.nist.gov/13
$ read -ru3; read -ru3; print "$REPLY"
58802 19-11-15 04:39:55 00 0 0 543.6 UTC(NIST) *

whereas 2020.0.0-beta1-166-g1025e369 crashes on exec.

0branch commented 5 years ago

Here are two more datapoints:

  1. In a fresh ksh (2020.0.0-beta1-166-g) session, I can establish a connection to NIST on fd 3 if it supplants an existing connection to Freenode (which as outlined above, works):

    $ exec 3<>/dev/tcp/irc.freenode.net/6667  # no crash
    $ read -ru3; print "$REPLY"                         
    :niven.freenode.net NOTICE * :*** Looking up your hostname...
    $ exec 3<>/dev/tcp/ntp-b.nist.gov/13      # now this works   
    $ read -ru3; read -ru3; print "$REPLY"
    58802 19-11-15 04:48:48 00 0 0   7.9 UTC(NIST) *
  2. The NIST connection crashes if it occurs first (again, a fresh 2020.0.0-beta1-166-g session):

    $ exec 3<>/dev/tcp/ntp-b.nist.gov/13      # crash
    Killed: 9

where (2) is essentially the originally reported issue.

krader1961 commented 5 years ago

I tested this on macOS 10.15.1 using it's ksh93u+ binary and saw your problem:

$ /bin/ksh -c 'exec 3<>/dev/tcp/ntp-b.nist.gov/13'                                                             8 ms
fish: '/bin/ksh -c 'exec 3<>/dev/tcp/n…' terminated by signal SIGKILL (Forced quit)

However, it only happens when running that specific command. Starting an interactive /bin/ksh session and doing the exec based redirect does not result in a SIGKILL. Whereas I see the same failure mode for both scenarios (i.e., -c and interactive) using ksh built from the current git master branch and the ksh93v- (ast.beta) branch. So this is a bug even with the ksh93u+ release but is also an unwanted change in behavior introduced by the ksh93v- release.

0branch commented 5 years ago

Thanks for testing.

Just to clarify: do you agree that this is a bug? (I see that the label was removed just before your first reply.)

0branch commented 5 years ago

Additional datapoint: crash in 93u+ with

$ echo $KSH_VERSION
Version AJM 93u+ 2012-08-01
$ exec 4<>/dev/tcp/ntp-b.nist.gov/13
Killed: 9

while this doesn't crash g1025e369. (Note fd 4 rather than 3.)

krader1961 commented 5 years ago

Just to clarify: do you agree that this is a bug?

Whether this issue documents a bug or long standing behavior that should be changed is unclear. Is the behavior a ksh bug or a platform behavior that needs to be accommodated by ksh? The unwanted, unexpected, behavior occurs even with a 7+ year old ksh version. Not just the current source.

After sleeping on the issue I've decided this should be classified as a bug. Primarily because, even if this is due to a macOS quirk, a straightforward C program that performs the same action would "just work". This may be an implementation bug but I won't be surprised if it turns out to be a design bug. Specifically, with how fd numbers are dynamically remapped using dup2() when exec 3<>/some-path is handled.

0branch commented 5 years ago

Just to clarify: do you agree that this is a bug?

Whether this issue documents a bug or long standing behavior that should be changed is unclear.

ksh crashes with a legal (syntactic, documented) statement—so that strikes me as a bug. (Printing an error would be another matter.)

Is the behavior a ksh bug or a platform behavior that needs to be accommodated by ksh? The unwanted, unexpected, behavior occurs even with a 7+ year old ksh version. Not just the current source.

Should that make a difference? I thought the intent of this repository was to continue KornShell development, which would include fixing historic bugs as well as newly introduced ones?

After sleeping on the issue I've decided this should be classified as a bug. Primarily because, even if this is due to a macOS quirk, a straightforward C program that performs the same action would "just work".

Alternative reading: From a user perspective, crashing unexpectedly is buggy behaviour.

krader1961 commented 5 years ago

Should that make a difference? I thought the intent of this repository was to continue KornShell development, which would include fixing historic bugs as well as newly introduced ones?

No to #1 and yes to #2, with a huge caveat. There are currently only two people who have contributed more than a single change in the past two years. Therefore, we have to characterize bugs that have existed for at least 7 years (back to the ksh93u+ release) as lower priority than bugs introduced since that release. If and when we have at least a handful of people regularly contributing or reviewing changes I'll be happy to prioritize old and new bugs based solely on the impact of the bug rather than its age. For now we have to care more about any bug people will notice that was not present in the ksh93u+ release.

0branch commented 5 years ago

Thanks for the explanation; this approach to prioritisation wasn't clear to me.