att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
559 stars 154 forks source link

Pressing 'b' to go to beginning of command in vi mode results in crash #1467

Open matvore opened 4 years ago

matvore commented 4 years ago

Description of problem: Pressing b to go to beginning of command in vi mode results in crash

Ksh version:

$ echo $KSH_VERSION
Version A 2020.0.0

(installed via Homebrew on macOS a few days ago)

How reproducible: 100% of the time.

Steps to reproduce:

  1. set -o vi
  2. Press: "asdf" then Esc then "b"

Actual results: Crash with stack trace:

asd### 25384 Function backtrace:
1   handle_sigsegv (in ksh) + 36
2   _sigtramp (in libsystem_platform.dylib) + 29
3   0x0000ffff (in ksh)
4   mvcursor (in ksh) + 1115
5   cntlmode (in ksh) + 603
6   vigetline (in ksh) + 1036
7   ed_viread (in ksh) + 1506
8   slowread (in ksh) + 566
9   sfrd (in ksh) + 2983
10  _sffilbuf (in ksh) + 1118
11  sfreserve (in ksh) + 1673
12  exfile (in ksh) + 2389
13  sh_main (in ksh) + 4689
14  main (in ksh) + 149
15  start (in libdyld.dylib) + 1
Abort

Expected results: go to beginning of line.

Additional info: Does not repro if I press B rather than b

I suspect this is caused by this line: https://github.com/att/ast/blob/c99e9ff13e8fc1bf46127e6f28d0f0bde16ffda0/src/cmd/ksh93/edit/vi.c#L319 which is in the b codepath. It checks vi_isalph(tcur_virt) before checking if tcur_virt is in range. These two clauses should be reversed. Note that line 316 is a similar check for pressing B, and there the tcur_virt value is checked first.

I don't have an environment to build and run the automated tests. If I have a chance to set one up, I'll try to send a patch.

jghub commented 4 years ago

I would advice, for the time being, to go back to using /bin/ksh on OSX which is the last stable release of ksh93 (i.e. ksh93u+). vi mode works just fine with it and the segfault you report here for ksh2020 is not seen with it.

ksh2020 ( which is what homebrew is providing via this repo to you) has known problems and the project is currently in a state of readjustment, see #1466

matvore commented 4 years ago

@jghub Thank you for the tip. /bin/ksh is working fine for me now.

I am new to ksh and I assumed the latest version would be the best. Reading some of the issues here, it sounds like that is not necessarily the case.

jghub commented 4 years ago

@matvore:

I am new to ksh and I assumed the latest version would be the best. Reading some of the issues here, it

hopefully you stay with it. benchmarking some busy loops against bash or zsh might be helpful to understand that ksh93 is in a different league for scripting/programming (and the only of the Bourne shell descendants supporting floating point arithmetic).

if you want to dive deeper into ksh this book

https://www.amazon.com/exec/obidos/ASIN/0131827006/qid=1005942232/sr=1-2/ref=sr_1_14_2/103-1397887-1135830

still is highly recommendable.

sounds like that is not necessarily the case.

I agree, obviously.

for now, I presume OSX /bin/ksh (which basically is stock ksh93u+) will work just fine for you overall. vi mode (which I use, too) is working OK but partly a little rough around the edges. and file name completion has some hiccups in certain situations (they were also described here in some issue I seem to recall). so, as an interactive shell, ksh is not "stellar" or superior to others, but descent enough I'd say.

if you encounter strange behaviour or bugs, please report them (for now, here, in the future hopefully in a new "ksh93 legacy" repo.).

cschuber commented 4 years ago

I'm using the FreeBSD ksh93-devel port, updated a day prior to the rollback (8cf92b28). I have no problems with "b". You might want to try the latest ksh2020 branch.

matvore commented 4 years ago

I'm using the FreeBSD ksh93-devel port, updated a day prior to the rollback (8cf92b2). I have no problems with "b". You might want to try the latest ksh2020 branch.

I tried building at that commit and am still getting the same problem. I assume there are a lot of other factors about whether an illegal access will be detected, like the compiler used and the OS, which would explain why we're seeing different behavior.

Switching the two clauses in the while loop condition did fix it, as I suspected.

hopefully you stay with it. benchmarking some busy loops against bash or zsh might be helpful to understand that ksh93 is in a different league for scripting/programming (and the only of the Bourne shell descendants supporting floating point arithmetic).

I have been trying out ksh for this exact reason - it sounds like a better programming language than other shells while still feeling a lot like a normal POSIX shell for interactive use.

https://www.amazon.com/exec/obidos/ASIN/0131827006/qid=1005942232/sr=1-2/ref=sr_1_14_2/103-1397887-1135830

still is highly recommendable.

Thank you, I'll keep a note of this.

HansH111 commented 4 years ago

I'm using the FreeBSD ksh93-devel port, updated a day prior to the rollback (8cf92b2). I have no problems with "b". You might want to try the latest ksh2020 branch.

Same with me, no problems, in ksh2020 version : 2020.0.0-beta1-208-g2f06a34e-dirty

krader1961 commented 4 years ago

This is a heisenbug. This bug is present in ksh93u+.

I had determined the root cause of this issue a few hours after this issue was opened. But I waited till now to see if anyone would take the issue seriously. I was not surprised in seeing the new people in control of the project assuming the bug was introduced by me. That is, since the ksh93v- or ksh93u+ release.

The bug is in the following if expression because tcur_virt == -1 and isblank(v) is a macro that expands to _isblank(virtual[v]). That dereferences the byte before the start of the dynamically allocated buffer referred to by virtual (which is an alias for editb.e_inbuf) points to:

https://github.com/att/ast/blob/27e72175bde58da31c02b90e4ca179bf84310521/src/cmd/ksh93/edit/vi.c#L688-L689

This bug will rarely be observed because it requires that

a) the buffer pointed to by virtual begin on a page boundary, and

b) the preceding page of virtual memory cannot be read.

The easiest way to reproduce this problem on the 2020.0.0 branch is to build it with ASAN enabled:

$ meson -DASAN=true
$ ninja
$ src/cmd/ksh93/ksh
KSH PROMPT:1: set -o vi
KSH PROMPT:2: asd=================================================================
==50054==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6310000007fc at pc 0x00010bfd7a74 bp 0x7ffee3cd0f00 sp 0x7ffee3cd0ef8
READ of size 4 at 0x6310000007fc thread T0
    #0 0x10bfd7a73 in backword vi.c:319
    #1 0x10bfe053d in mvcursor vi.c:1155
    #2 0x10bfd8deb in cntlmode vi.c:380
    #3 0x10bfd2dc8 in vigetline vi.c:914
    #4 0x10bfcb64a in ed_viread vi.c:254
    #5 0x10c054094 in slowread io.c:2017
    #6 0x10c2da9ce in sfrd sfrd.c:236
    #7 0x10c2b3a04 in _sffilbuf sffilbuf.c:100
    #8 0x10c2df0b2 in sfreserve sfreserve.c:137
    #9 0x10c0cc15a in exfile main.c:471
    #10 0x10c0d0f5e in sh_main main.c:313
    #11 0x10bf2375b in main pmain.c:39
    #12 0x7fff73a537fc in start (libdyld.dylib:x86_64+0x1a7fc)

0x6310000007fc is located 4 bytes to the left of 65537-byte region [0x631000000800,0x631000010801)
allocated by thread T0 here:
    #0 0x10c5a59f3 in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x5e9f3)
    #1 0x10c24068b in ast_malloc vmbusy.c:25
    #2 0x10c051511 in sh_iostream io.c:324
    #3 0x10c0502e9 in sh_ioinit io.c:236
    #4 0x10c03bf8e in sh_init init.c:1137
    #5 0x10c0cdb2d in sh_main main.c:114
    #6 0x10bf2375b in main pmain.c:39
    #7 0x7fff73a537fc in start (libdyld.dylib:x86_64+0x1a7fc)

That sort of reproduction is immensely harder now that the code is once again using the AST vmalloc subsystem rather than the platform malloc subsystem. This means that using tools like ASAN are, once again, impossible. Not to mention that the project no longer includes any ability to run interactive tests. Something @siteshwar and I addressed in the work leading up to the 2020.0.0 release by introducing expect based tests of interactive behavior.

The first big change I made was replacing the AST vmalloc subystem with the OS malloc subsystem. That resulted in several unit test failures that were not previously seen. All of those test failures were due to the same type of bug as this issue. That is, dereferencing an address outside the bounds of dynamically allocated arrays. We found many more such bugs after enabling ASAN and Valgrind.

DavidMorano commented 4 years ago

@krader1961

That sort of reproduction is immensely harder now that the code is once again using the AST vmalloc subsystem rather than the platform malloc subsystem.

You did not replace the system MALLOC w/ AST VMALLOC in ksh2020, did you? I hope not.

gordonwoodhull commented 4 years ago

@DavidMorano, I think @krader1961 is referring to #1466, the rewinding of this repo to ksh93u. I don't see why ksh2020 can't live on elsewhere. It is available on the ksh2020 branch.

This is just a repo. Forks happen.

jghub commented 4 years ago

This is a heisenbug. This bug is present in ksh93u+.

in the so-called "stable" ksh2020 release it is 100% reproducible in OSX (like for the OP). nothing of a heisenbug to it at all (on that platform, anyway).

OTOH, I was not able to make ks93u+ segfault ONCE under OSX due to vi-editor misbehaviour in >5 years using this editor mode.

if krader were technically right to claim that it is a bug in ksh93u+, he should provide an example how to trigger that bug in ksh93u+ I'd say. if that is not possible it is not a real bug but a perceived or theoretical one (which might or might not be worth to fix at some point).

I had determined the root cause of this issue a few hours after this issue was opened. But I waited till now to see if anyone would take the issue seriously. I was not surprised in seeing the new people in control of the project assuming the bug was introduced by me. That is, since the ksh93v- or ksh93u+ release.

for the record and in order to prevent krader's misinformation to take hold:

1. there are no "new people in control of the project". @gordonwoodhull and the ATT/AST team came to a (good) decision due to the situation explained in #1464 and #1466 and revoked commit rights to krader (and everybody else external to ATT) and rewound to ksh93u+ while moving ksh2020 to the side on a branch. "not taking the issue seriously": a) this issue concerns ksh2020 and is not observed with ksh93u+. b) future work on ksh93u+ and ksh2020 will have to happen in (separate) forks of the present repo.

2. nobody assumed anything. the factual statement is "it is a 100% reproducible bug in the ksh2020 release under OSX and the segfault is not observed in ksh93u+ at all with the same OS".

3. there is no official ksh93v- release. ksh93v- was/is beta. ksh2020 is a descendant of this beta-software and did never manage to make the new, still buggy ksh93v- features work or get otherwise out of beta-state. rather those features were deleted. AFAIK, ksh2020 does provide no desirable functionality beyond what ksh93u+ provides but is way more buggy due to its ksh93v- heritage plus its own new problems than ksh93u+. and slower. "buggy" as in "does not work properly, misbehaves, segfaults".

This bug will rarely be observed because it requires that

4. this bug is always observed with the ksh2020 release under OSX.

regarding the future of ksh2020: as @gordonwoodhull already pointed out, krader is free to fork and go ahead with it. other people will go ahead starting from ksh93u+ it seems.