att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
541 stars 152 forks source link

build process -- Food for thought #42

Closed marcastel closed 6 years ago

marcastel commented 7 years ago

Should we change/adapt the build process?

(with the objective to make it easily maintainable and accessible to a greater number)

Small knowledgeable community

I know nobody amongst my fellow developers that has insight knowledge on the build process beyond bin/package make, let alone to maintain it, but simply to customise it. Neither do I.

That said, they know how to maintain and tweak GNU Autotools and CMake toolchains.

Note: Do not jump to conclusions here, I am not trying to promote Autotools or CMake; quite the contrary. I simply want to emphasise the lack of openess of the build process' logic and toolchain.

Scarse documentation

Embedded usage documentation in any C or Korn shell script is a fantastic asset of the AST developments, and certainly immensely underused among users of AST packages.... except for the AST developers who have consistently added usage information to all their utilities.

Nonetheless that documentation does not suffice for a newbie to get his head round the build toolchain and gain sufficient insight information to act alone without calling out for help.

Today I am confronted with a failing build on a platform, which is certainly not exotic (MacOS), and I find myself spending hours trying to understand where the errors occur.

Unless told otherwise, I have no supporting information to help me get through my build failures. And calling out for help won't be of great help because ( I presume) only a few have invested significant time in understanding the guts of the toolchain. Questions will take time to be answered, if ever answered.

Build tool

When all goes well, the AST build toolchain seems to beat flat out the other tools mentioned above. It has (apparently) no dependencies, allows for all the GNU Autoconfigure probing without the M4 hell, and nicely lays out its build products.

Opinion: GNU Autotools are a fantastic suite. But they have a major inconvenience: M4. Opaque and to a certain extend clumsy. Probably a good compromise for portability 30 to 40 years ago. But no longer the ad hoc tool for today; pre-processing could be done the AST way :-)

Could the AST build toolchain be system agnostic and a possible replacement for GNU Autotools or CMake on other projects? A toolchain written in portable POSIX shell targeting any raw (POSIX) UNIX or Linux.

Whilst this was probably a driver in its conception, we see, going through the source files that it depend on bash here, lynx or wget there, etc. So it is not agnostic and doesn't build on a raw system; it requires GNUish capabilities. Hence it targets UNIX/GNU or Linux/GNU platforms.

Note: for the reasoning let us ignore for now that we probably need gcc to avoid proprietary compilers (where such compilers still exist).

Logically one can ask, why then maintain a distinct build toolchain? Why not use GNU Autotools or CMake?

Liminary thoughts

The breakup of the AST development team has (luckily) brought the AST developments to the open source community. But the community is small (and probably fragile).

If the AST packages and the Korn shell are here to stay, the community needs to be enlarged.

Enlarging the community means, making the build process accessible to many.

Migrating to GNU Autotools or CMake is an enormous effort which would require such time investment that it is almost guaranteed to stall.

Documentation and HOWTOs seems to be the only realistic approach. This also requires time, and reverse engineering.

Request for comments

In the 90s, shell portability was a big concern, and scripting had to focus on POSIX shells only (Korn shell wasn't a POSIX shell at the time, it now is).

Today, thanks to AT&T opening up the source code, a Korn shell exists on (almost) every platform. Not PDKSH or old versions, but a ksh93 executable (whatever its release).

Consequently, in 2017 onwards, we can assume that we have a Korn shell executable that supports the 93 syntax and features.

Converting the AST build toolchain scripts from universal shell syntax to Korn shell 93 syntax can: a) greatly reduce the LOC (e.g. iffe could be reduced by 50%) b) allow for clean environments with the function keywords, limiting globals c) break down the code into smaller and more maintainable chunks using FPATH d) usage information can be added to all functions

This doesn't require a full reverse engineering effort, nor does it require a full rewrite of the code.

At the same time this allows for a learning curve which can be populated in HOWTO's and central documentation.

By doing this we can (re)gain knowledge of the AST build toolchain, document it properly for the community to get involved, and lead the way for a ksh2023 rather than a ksh93+z2023 :-)

dannyweldon commented 7 years ago

You could try reaching out to the mailing list, there may be more people subscribed than those that are watching this git repo.

To subscribe, try sending a plain text email to mailman-request@lists.research.att.com with the word "help" in the body and follow the instructions you receive. (Note, the mailmain web server is no longer working, but it will respond to email commands.)

What part of the build is failing and what errors are you getting?

I did not think that anything ultimately depended on bash, however there may be bash-specific work-arounds in the generic bourne-shell compatible scripts. eg. in bin/package

As for the use of curl and wget, they aren't 100% necessary either because hurl.sh (src/cmd/INIT/hurl.sh) can fall back to using /dev/tcp/$HOST/$PORT style connections if it's running under bash or ksh and it can't find curl or wget.

I recently looked into the build failures for:

cmd/kshlib/dss cmd/kshlib/cmdtst

only to come to the conclusion that the dss builtin has been broken for a while and needs rewriting because the API version of nvapi has changed. And I'm not yet sure why cmd/kshlib/cmdtst tests are failing, but I doubt that it is needed any more because the grep and xargs builtins are integrated and working fine now in src/lib/libcmd.

I have been looking at getting the ast repo building automatically with travis ci, but currently bin/package test has over 680 errors on a working build on my linux machine! I am thinking that some of those tests may need to be removed or silenced until they pass reliably.

We don't have access anymore to the ast build farm to build on multiple platforms, but at least travis ci can test on linux and macos:

https://docs.travis-ci.com/user/multi-os/

So I think that those platforms should be first-class platforms that have to be able to build and test without errors. We may also be able to get an x86-based Solaris system in a virtual machine somewhere building automatically as well as freebsd, darwin and even cygwin.

We should also start to document how to debug the build system in the github wiki.

dannyweldon commented 7 years ago

Siteshwar from Redhat now has commit access to the repo and has added a PR for freebsd #19 that might help, as OSX is based on BSD I believe.

Also, he has added a travis file to the beta branch, but it is currently only targeting fedora and is not yet running any tests.

saper commented 7 years ago

I personally quite like the build system here, although I have yet to fully wrap my head around it (I am currently debugging a problem related to iffe to detecting things fully on FreeBSD 11).

Manpages are no longer on line, I think getting documentation online and some introductory material would help.

I have even made my own little "release" including #19 - I was surprised how quick and easy it was.

dannyweldon commented 7 years ago

Man pages are readable here:

https://web.archive.org/web/20151104235435/http://www2.research.att.com/~astopen/download/ This link is the best place to start as it seems to set up the frames properly.

Then visit: AST, nmake, overview Also: Manual, Commands, iffe + package + nmake + probe (But there are links to these in the above)

krader1961 commented 6 years ago

FWIW, I used ksh88 then ksh93 for more than two decades when I worked for Sequent Computer Systems. Then switched to zsh when I changed jobs. I then abandoned zsh when I realized the zsh architecture was broken beyond repair.

After two years of contributing to the Fish shell project I've abandoned it for several reasons. Primarily because it seems like the big problems with its implementation (e.g., how I/O through pipelines is handled) will never be fixed. Also, because the current developers are expending effort on pointless changes such as changing FISH_VERSION to fish_version without any justification other than that one, very inexperienced, contributor thinks that any variable which isn't exported cannot be all uppercase.

So I was intrigued to see that the Korn shell had been open sourced and hosted on Github. But I can't figure out how to get nmake built and usable on macOS (i.e., OS X). The homebrew command doesn't seem to know how to install nmake. And attempts to sh ./bin/package make fail because nmake is not available. I would be interested in using and contributing to ksh93 development. But there needs to be either

a) better documentation for how to build ksh93 on macOS (google searches didn't yield any answers), or

b) switching to a more modern build system like Cmake or autotools/autoconf.

Admittedly the latter isn't very modern but it is more widely supported and understood than the AT&T nmake mechanism.

siteshwar commented 6 years ago

I don't have access to an OS X machine, but I have put the build script that I use to compile ksh on fedora in notes here. Latest changes are in beta branch, so I will recommend building from there. Also, did you use clang or gcc to compile on OS X ? I would suggest you to try compiling with gcc as the build system behaves very strangely sometimes. I agree with your thoughts about using a more modern build system.

ksh is extremely fragile and is very prone to regressions. It suffers due to implementation too. As an example see how build broke last time when there were changes in glibc. So I try to keep changes to it minimal.

saper commented 6 years ago

./bin/package make is generally the way to go. One of the first steps is to build nmake, most probably it fails somewhere on the way.

The build usually proceeds even if the previous steps failed, so it is worth examining the build log from the top rather from bottom and find the issue there. nmake being not present is just a consequence of the previous build failure.

krader1961 commented 6 years ago

@siteshwar, Thanks for that script. Just running sh ./bin/package make on macOS produces output that begins like this:

package: make start at Thu Oct 12 20:12:18 PDT 2017 in /Users/krader/projects/3rd-party/ast/arch/darwin.i
CC=cc
SHELL=/usr/local/bin/ksh
HOSTTYPE=darwin.i386
NPROC=12
PACKAGEROOT=/Users/krader/projects/3rd-party/ast
INSTALLROOT=/Users/krader/projects/3rd-party/ast/arch/darwin.i386
PATH=/Users/krader/projects/3rd-party/ast/arch/darwin.i386/bin:/Users/krader/projects/3rd-party/ast/bin:/
cmd/INIT:
ksh[56]: eval: line 6: 41460: Abort
make: *** termination code 6 making cmd/INIT
ksh[68]: wait: 41459: Abort
lib/libast:
ksh[75]: eval: line 6: 41467: Abort
make: *** termination code 6 making lib/libast
ksh[87]: wait: 41466: Abort

Your script actually results in a ./arch/darwin.i386-64/src/cmd/nmake directory being created but not nmake being built. So the build then fails to find nmake:

+ nmake --base --compile '--file=/Users/krader/projects/3rd-party/ast/src/cmd/nmake/Makerules.mk'
/usr/local/bin/ksh: line 4: nmake: not found
mamake [cmd/nmake]: *** exit code 127 making Makerules.mo

I should point out I have a hybrid macOS system since I have installed many GNU tools via Homebrew and have arranged for some of the GNU tools to shadow the macOS/BSD variants. But that has not generally been a problem when working with other open source software.

On Ubuntu 16.10 just running sh ./bin/package make results in a working ksh binary (in ./arch/linux.i386-64/src/cmd/ksh93/ksh). So my problem is clearly unique to macOS (aka OS X).

I'll spend a little more time trying to get ksh to build on macOS but I'm not very motivated to do so since other shells (e.g., Elvish) build and run on macOS without jumping through hoops and are more likely to have a future.

saper commented 6 years ago

Seems like your shell is crashing, what is your /usr/local/bin/ksh and what happens if you move it away (don't use it)?

dannyweldon commented 6 years ago

Also, are there any errors in arch/darwin.i386-64/lib/package/gen/make.out ?

krader1961 commented 6 years ago

The first problem I found was an unwanted line wrap from cutting/pasting @siteshwar's script. Once I fixed that I found that symbols like nl_catd were not defined. That's because the ./arch/darwin.i386-64/include/ast/ast_nl_types.h files that is generated by iffe doesn't define it. And that file is included by ./src/lib/libast/std/nl_types.h which shadows the system provided header of the same name where the symbol is defined. I hacked around that problem by moving src/lib/libast/std/nl_types.h out of the way and modifying the #include statements in the affected files to include the system header of that name and the ast_nl_types.h header.

There are several discussion threads about this header problem when building with the AST tools. Such as this one: https://mail-index.netbsd.org/netbsd-bugs/2014/07/14/msg037462.html.

After working around the nl_types problem the build gets a lot farther and does generate a nmake binary. But the build then fails with a lot of lines like these:

ksh[1424]: eval: line 6: 29178: Abort
make: *** termination code 6 making cmd/ncsl
ksh[1436]: wait: 29177: Abort
cmd/pack:
ksh[1443]: eval: line 6: 29184: Abort
make: *** termination code 6 making cmd/pack
ksh[1455]: wait: 29183: Abort
lib/libvdelta:

Forcing /bin/sh to be used by using ./bin/package make -S SHELL=/bin/sh to do the build still results in the same "abort" messages -- they're just formatted differently. Same with SHELL=/bin/bash. So it isn't the shell. It looks like it's /Users/krader/projects/3rd-party/ast/arch/darwin.i386-64/ok/bin/nmake that is dying from receiving SIGABRT. Presumably an assert() is failing. Note that I can successfully invoke it with the -v switch.

Sure enough. I enabled core dumps and what we see is that nmake is invoking strcpy() on overlapping buffers which is undefined behavior. You have to use memmove() in this situation:

* thread #1: tid = 0x0000, 0x000000010ce00d42 libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal SIGSTOP
  * frame #0: 0x000000010ce00d42 libsystem_kernel.dylib`__pthread_kill + 10
    frame #1: 0x00007fffbd00d457 libsystem_pthread.dylib`pthread_kill + 90
    frame #2: 0x00007fffbce854bb libsystem_c.dylib`__abort + 140
    frame #3: 0x00007fffbce8542f libsystem_c.dylib`abort + 144
    frame #4: 0x00007fffbce85592 libsystem_c.dylib`abort_report_np + 181
    frame #5: 0x00007fffbceabf28 libsystem_c.dylib`__chk_fail + 48
    frame #6: 0x00007fffbceabf38 libsystem_c.dylib`__chk_fail_overlap + 16
    frame #7: 0x00007fffbceabf69 libsystem_c.dylib`__chk_overlap + 49
    frame #8: 0x00007fffbceac132 libsystem_c.dylib`__strcpy_chk + 64
    frame #9: 0x000000010cc51eaf nmake`resetvar(p=0x00001e00ac75c590, v="LICENSE=since=2003,author=gsf", append=2048) + 319 at variable.c:580
    frame #10: 0x000000010cc51768 nmake`setvar(s="TA", v="", flags=2048) + 1368 at variable.c:680
    frame #11: 0x000000010cc32798 nmake`assignment + 1032
    frame #12: 0x000000010cc2d58b nmake`parse + 1275
    frame #13: 0x000000010cbefc54 nmake`apply + 660
    frame #14: 0x000000010cc2f7d9 nmake`assertion + 137
    frame #15: 0x000000010cc2d576 nmake`parse + 1254
    frame #16: 0x000000010cc3ab51 nmake`readfp(sp=0x00001e00ac6e9b00, r=0x00001e00ac6f9000, type=18) + 5553 at read.c:407
    frame #17: 0x000000010cc392a4 nmake`readfile(file="/Users/krader/projects/3rd-party/ast/src/cmd/INIT/Makefile", type=18, filter=0x0000000000000000) + 964 at read.c:455
    frame #18: 0x000000010cc0f048 nmake`main(argc=11, argv=0x00007fff53018c90) + 6136 at main.c:662
    frame #19: 0x000000010cdde235 libdyld.dylib`start + 1
krader1961 commented 6 years ago

Replacing strcpy(p->value, v); with memmove(p->value, v, n + 1); in src/cmd/nmake/variable.c fixes the first point of failure. Revealing that in another spot it's calling memccpy() with overlapping buffers which is also undefined behavior:

    frame #8: 0x00007fffbceac18a libsystem_c.dylib`__memccpy_chk + 69
    frame #9: 0x0000000105637919 nmake`sfputr(f=0x00001e00a8425b00, s="strings.h", rc=0) + 1097 at sfputr.c:109
krader1961 commented 6 years ago

Okay, I fixed the strcpy() and memccpy() bugs from my previous comments. The build now gets much farther and the number of fatal build errors has dropped from 100 to 41 because nmake is no longer triggering assert()'s :smile: Unfortunately, every single binary (other than those like nmake and iffe used to drive the buidl) fail to link with these errors:

Undefined symbols for architecture x86_64:
  "__ast_catclose", referenced from:
      _match in libast.a(translate.o)
      __ast_translate in libast.a(translate.o)
  "__ast_catgets", referenced from:
      _match in libast.a(translate.o)
      __ast_translate in libast.a(translate.o)
  "__ast_catopen", referenced from:
      _find in libast.a(translate.o)
ld: symbol(s) not found for architecture x86_64

So we're back to the problem that the iffe tests for the nl_* family of symbols on BSD is broken.

saper commented 6 years ago

@krader1961 not sure I have a fix for this problem but in case you keep on troubleshooting the build I have a small collection of patches to make it build on FreeBSD.

krader1961 commented 6 years ago

@saper, What I don't understand is why your PR is needed or why I am seeing compatibility problems building on macOS. I just booted my FreeBSD 12.0 virtual machine. I did sudo pkg install ksh93 which succeeded. But running ksh93 resulted in this error:

/usr/local/bin/ksh93: Undefined symbol "readdir"

Is it the case that ksh93 on BSD systems has been broken for a long time?

krader1961 commented 6 years ago

My PR #76 to fix the two places that don't handle overlapping buffers on BSD correctly plus @saper's PR #19 lets me build everything but dlls and pax on macOS Sierra (10.12.6). But it's unbelievably slow to do so. The fastest time I've seen is 23 minutes (on a 12 core Mac Pro with 24 GiB of memory). In no small part because of all the seemingly redundant iffe invocations. For example, iffe: test: is -liconv a library ... yes occurs 93 times in the build log. The Travis build for my PR also took 23 minutes. Surely we can find a way to make building just ksh a wee bit faster :smile: I suspect that 99% of users looking at this project are not interested in any of the other commands bundled with it other than ksh.

Also, when I do ./bin/package make -S it keeps modifying the files in ./bin/ by prepending lines that make the scripts no longer executable and makes git think the changes to them need to be committed. So I did a "git checkout bin/package" then did sh bin/package clean and it erased the entire project tree including the ast directory! Gee, thanks, I guess, since it doesn't look like it erased anything else I care about. While the AST build system met the needs of this project twenty years ago today it seems like more trouble than it's worth.

krader1961 commented 6 years ago

I noticed ./bin/package make -S is testing for cosine math functions:

iffe: test: is cos a library function ... yes
iffe: test: is cosl a library function ... yes
iffe: test: is cosf a library function ... yes
iffe: test: is cosh a library function ... yes
iffe: test: is coshl a library function ... yes
iffe: test: is coshf a library function ... yes

None of the commands or library code in the project use those functions. Obviously I picked those at random as they caught my eye while looking at the build log. There are hundreds of such feature tests that are not relevant for any of the code being compiled.

If the ksh command in this repo is only going to get bug fixes and ports to new environments then changing the build tool chain doesn't make sense. But if this is going to be an active, evolving, project then it definitely needs to be refactored and updated to use Cmake.

krader1961 commented 6 years ago

Okay, I see my previous comment about math functions like acosh() not being used despite probed for is incorrect. Those functions are table driven so my original search of the code for invocations failed to find them. /me wipes egg off my face.

saper commented 6 years ago

Symbol visibility is an important issue, that is why this simple patch may fix a lot on non-Linux platform. Regarding 12.0 I I would check if there was no API/ABI change, may happen as 12.0 is the unreleased -CURRENT. Maybe some other change broke it.

siteshwar commented 6 years ago

fwiw i would also like to evaluate meson as a possible option for new build system.

krader1961 commented 6 years ago

FWIW, I've been trying to figure out why I can build ksh93 with gcc but not clang on macOS. One reason is the use of -I-. This causes gcc to emit a warning:

make.out.gcc:cc1: note: obsolete option -I- used, please use -iquote instead

Clang treats it as an error:

clang: error: '-I-' not supported, please use -iquote instead

There are other compiler options, such as -mr, that neither gcc or clang support. And while the -mr flag isn't actually used after the initial probe for flags supported by the build tools the -I- flag is used (some of the time) which causes part of the build with clang to fail.

This causes the AST build system to use the ppcc wrapper script rather than invoking the cc command directly. That causes problems because that script interprets flags like -fno-strict-aliasing as equivalent to the sequence of short flags -f -n -o .... And ppcc treats -n as meaning not to actually compile the source into an object file.

I've worked around those issues and have managed to build ksh93 with clang on macOS and Ubuntu. What's interesting is that some unit tests that fail when built with gcc pass with clang and vice versa on macOS. On Ubuntu the test results are identical for the two compilers. Furthermore, the Ubuntu failures don't match the failures seen on macOS with either compiler. The fewest errors (154) occurred on Ubuntu. Clearly there are lots of problems with the existing unit tests.

P.S., I've also noticed that the ksh binary this build process produces has commands like cat and chmod implemented as builtins. But that isn't the case for the ksh v93u provided by Ubuntu or macOS (or Homebrew on macOS). It's not obvious to me that having those particular commands as builtins is a good thing.

krader1961 commented 6 years ago

fwiw i would also like to evaluate meson as a possible option for new build system.

The fish-shell project discussed using Cmake and Meson here and ultimately chose Cmake. However, a couple of the reasons they rejected Meson don't apply to this project. And I do love that Meson is built on Python. However, as this blog article notes Meson introduces yet another way to build projects without solving any significant problems with Cmake and is much less mature.

If we do go to the trouble of replacing the current build system we should not switch to autotools. While venerable, widely available, and lots of people are familiar with it (unlike nmake) it has almost as many quirks as our current build system and would not be much of an improvement.

qbarnes commented 6 years ago

Unless their functionality has changed greatly in the last few years, please avoid using GNU autotools. Whenever you have to step off the beaten path, you plummet to the bottom of a ravine. They have way too many implicit and hidden dependencies. And they are a mess whenever trying to migrate software for new, evolving environments (OSes) or when cross-building software.

siteshwar commented 6 years ago

@jhfrontz mentioned the history of -I- flag here.

Regarding the choice of build systems, we have more than one persons agreeing that we should not be using autotools.

krader1961 commented 6 years ago

For the record I have found the following to be the minimum set of files and directories needed to build ksh93 and run its unit tests. From src/lib:

Makefile   libardir   libcmd     libcoshell libdll     libexpr    libodelta  libsum
Mamfile    libast     libcodex   libcs      libdss     libmam     libpp

From src/cmd:

INIT     Mamfile  cpp      kshlib   msgcc    probe    tests
Makefile builtin  ksh93    mam      nmake    re

Obviously switching to Meson or Cmake would eliminate several of those. Using this bare minimum reduces the ksh93 build time roughly 25%. It's still obscenely slow because of all the redundant invocations of iffe and the fact that some programs are built twice.

krader1961 commented 6 years ago

Haha! The person who wrote Meson wrote a blog article about the transition to Cmake at Canonical (the company that produces Ubuntu): https://blog.kitware.com/use-of-cmake-at-canonical/. Which makes me inclined to vote for switching to Meson for this project. 😄

krader1961 commented 6 years ago

I just spent a couple of hours reading various Reddit threads, Stackoverflow questions, and blog posts about the merits of Cmake versus Meson. For example, this article from July of this year is a strong thumbs up for Meson over Cmake. However, the article does end on this note:

For now I found only one thing that would have to let me go back to CMake once in a while: meson requires Python 3.4 and newer. This is not the case on a few machines I still have to work on, but time will let these phase out too.

Given that ksh93 is still trying to support ancient K&R compilers I'm wondering how much of a deal breaker the dependency on Python 3 is. Obviously we no longer need to support K&R (pre ANSI C) compilers. But what about old OS's like Solaris which may not have Python 3 available?

krader1961 commented 6 years ago

OMFG! I just noticed this in the bin/package make output:

iffe: test: is universe a command ... no

That is testing for whether ksh is running on an OS such as Sequent Computer System's DYNIX/3 OS that had a BSD 4.2 (known as the "ucb" universe) personality and a UNIX SysVR2 (known as the "att" universe) personality. You switched between them using the universe command. I know this because I worked for Oregon's DEQ agency which bought a Sequent S27 (a bleeding edge SMP architecture at that time) in the 1980's and I then went to work for Sequent. We really do not need to be wasting time and CPU cycles trying to support long dead operating systems.

siteshwar commented 6 years ago

I can also recommend watching Jussi Pakkanen's recent talk from All Systems Go conference.

dannyweldon commented 6 years ago

If you only want to build ksh93, you can just run:

bin/package make ast-ksh

It still looks like it keeps repeating the same tests over and over though. My guess is that it either doesn't save that state or saves the state in separate directories.

Don't forget that out of all the other packages, you will still need the msg* commands: msggen, msgget, etc. in case someone wants to create localisations (spelling intended :-).

And about builtins for cat, grep, head, cut, etc. They should be in your ubuntu and macos ksh, but not enabled by default. You would have to enable them by running:

builtin cat head cut

Or, this magically enables all the ast builtins:

PATH=/opt/ast/bin:$PATH

I don't remember them being enabled by default before, so we might need to check building 2012-08-01 and see if they were not enabled. If not, we probably should disable them by default I think. They will still be available, of course. Unless this is now the intended behaviour.

My understanding for the builtins is that they are way faster (no forking) and 100% portable as you are guaranteed to have the same version on every platform. This is one of ksh's cool features.

Another thing is we don't know if anyone else out there is depending on the ast toolchain. And of course there is also UWIN. :-)

My interests in ksh lie in expanding the syntax in line with modern programming languages, unlike what zsh has done, but that is for another issue.

jhfrontz commented 6 years ago

Another thing is we don't know if anyone else out there is depending on the ast toolchain.

Right -- I use nmake (Glenn Fowler's, not the microsoft one) daily (as do, I suspect, a lot of folks at AT&T, Nokia/ALU, many AT&T/ALU spinoffs, companies staffed by Bell Labs refugees, etc.).

In reference to the antiquated architectures, beware that there are also a group of folks who try to keep old hardware running (as hobbyists or whatever). Also beware that seemingly "dead code" sometimes acts as history/documentation (e.g., for porting to new architectures/environments).

krader1961 commented 6 years ago

If you only want to build ksh93, you can just run: ....

True. The reason I bothered to identify the minimum set of directories necessary to build ksh93 was because I wanted to know which source dirs could be ignored when trying to setup an alternate build tool like Meson. Also, which source dirs could be ignored if someone wanted to run the code through tools like clang-format and oclint.

You would have to enable them by running: builtin cat head cut

Ah, yes. Thx for pointing that out. Yes, type cat on ksh93u that ships with macOS reports that it is a builtin after typing builtin cat but not before. Whereas ksh93v does not require enabling the builtin. Having the builtins be automatically enabled seems dangerous to me so I'm inclined to recommend reverting to the ksh93u behavior that requires explicitly enabling them. It's not obvious from the commit history when this changed. Someone will probably need to do a git bisect to find the commit.

there are also a group of folks who try to keep old hardware running

Understood, but it seems to me those environments can either a) stick with ksh93u and its Nmake based build system, or b) install the dependencies (e.g., python3 and a modern C compiler) if they want to build a bleeding edge ksh93. At some point projects as old as ksh need to drop support for old environments if they're going to remain vital. Case in point is the fish-shell. Earlier this year they switched from the C++99 to the C++11 standard because the benefits of the newer standard were more compelling than continuing to support OS X 10.6 (Snow Leopard).

krader1961 commented 6 years ago

Another thing is we don't know if anyone else out there is depending on the ast toolchain. And of course there is also UWIN. :-)

@dannyweldon, Are you implying we should continue to support building ksh93 as part of the UWIN project? Or was that a tongue in cheek comment? UWIN is certainly interesting from a historical perspective and I applaud the AT&T AST project team for creating it. But I don't see any reason why we should continue to support it any more than we should continue to support the long dead DYNIX/3 OS by Sequent Computer Systems (where I worked for two decades). Anyone using UWIN can simply use ksh93u or an older version. We probably do, however, want to make an effort to support Cygwin even though it has been supplanted by WSL (Windows Subsystem for Linux).

siteshwar commented 6 years ago

I am in favor of removing support for old or non-POSIX operating systems. We should bring down the dependencies to the minimal before moving to a new build system. For e.g. we can remove support for POSIX compatiblity functions related to directories (See https://github.com/siteshwar/ast/commits/disable-libast_dir). This would save us some work.

siteshwar commented 6 years ago

I am playing around with porting some part of AST code to Meson, however it looks like it will be hard to remove iffe as a dependency. It is used for feature testing and generates header files that are included in sources. It will not be easy to get rid of it without significantly changing the code. We can continue to use with new build system and remove it later.

krader1961 commented 6 years ago

I've been trying to get ksh to build on FreeBSD 11.1. The fact that AST defines its own headers (e.g., wchar.h) which shadow system headers is a PITA. I'm sorely tempted to bite the bullet and modify the ksh source to just use the system provided headers and libc. Then define fallback implementations as needed to workaround bugs or missing features of a particular OS. That's the approach used by most projects.

siteshwar commented 6 years ago

My initial impression was that wchar header is there to support non-POSIX systems. But I suspect it's also a hack to allow using sfio library. FILE type is modified here. I am not a big fan of such hacks so I am fine if you drop it.

krader1961 commented 6 years ago

Some people have commented elsewhere that worrying about how long it takes to build ksh93 is silly. After all, who builds it from source? For comparison building Elvish (written in Go) takes 5 seconds on my server. Building Fish (written in C++ with PCRE2 and MuParser source included in the project) takes 2 minutes. Time matters. If I have to wait 12+ minutes for ksh93 to build before I can even run the unit tests (which are also incredibly slow) that has a huge, negative, impact on how long it takes to implement a bug fix or enhancement.

jelmd commented 6 years ago

FWIW: takes ~9min on my desktop (i.e. tar xzf INIT.${VERSION}.tgz; bin/package make; mkdir lib/package/tgz; cp ${SRC}/ast-ksh.${VERSION}.tgz lib/package/tgz/; bin/package read; applyPatches; bin/package make SHELL=/bin/bash) - building nodejs takes ~3min but it uses 8 instead of 1 core ... So utilizing more than a single core would probably improve things/get it on par with similar stuff.

Anyway, once build a touch ./src/lib/libast/comp/getwd.c; bin/package make SHELL=/bin/bash just takes 6s - I would say, good enough, not that bad as it sounds in your statements (but maybe I didn't hit the right point, yepp).

BTW: I'm not an autotool fan at all and try to avoid python if possible for several OT reasons (furthermore IIRC, meson does not support Solaris & friends). So IMHO not that much left as a "new build system" ;-)

krader1961 commented 6 years ago

@jelmd, I don't understand why I would need to touch ./src/lib/libast/comp/getwd.c. I also don't understand the bin/package read or applyPatches steps in your comment. Why can't I just clone this project and build it?

The main problem is that bin/package make ast-ksh in a clean directory with no previous build artifacts is incredibly inefficient compared to building any other shell (bash, zsh, fish, elvish) that I have built on my server. If you can avoid the need to do a clean build (see the next paragraph) the situation isn't so bad. But it is still a big problem when building and testing ksh93 in continuous integration environments like TravisCI. The current build process is unacceptably inefficient when compared to the alternatives.

P.S., I did figure out how to avoid the bin/package make hang where it tries to cat /proc/registry/HKEY_LOCAL_MACHINE/Hardware/Description/System/CentralProcessor (issue #89). Simply do something like export NPROC=4 first. Which should be unnecessary but reflects the fact the Nmake build tools are really old and no longer maintained and still try to support a native MS Windows (win32) build of the shell.

krader1961 commented 6 years ago

@jelmd, The fact it takes ~9min on your system to build ksh93 (25% faster compared to my five year old server) isn't the point. The point is that building ksh93 from a source tree without any prior build artifacts is significantly slower than comparable shells. This has real ramifications for things like continuous integration testing and anyone trying to improve the project on their local server.

saper commented 6 years ago

If the modern shells use multiple cores at the same time and we don't it is an unfair comparison.

I came here out of interest in iffe as an alternative to autoconf and better meta-make systems. I personally like BSD .mk system very much, but sometimes it is too simple.

jelmd commented 6 years ago

@krader1961: my desktop is from Dec 2012 (HPz420). Anyway, it is just an example to show, that this stuff obviously depends on the environment used - I'm using a CPU@3.3 GHz and I guess wrt. your timings, you use a CPU@2.4 +- 0.2 GHz. So getting/using better HW is one option to reduce the pain (I mean, if I wanna make it from Berlin to NY, I usually do not take a ship ;-) ... ).

Furthermore the example shows another way for optimization: doing things in parallel. E.g. the machine I used has 6 cores (thus 12 strands or in Intel terms 12 threads). When ksh93 gets build (I use NPROC=8), the average load is 24.6%. When building nodejs with -j 8, the average load is 853.1%. And because I guess this is the next question: set to use only 1 CPU average load is 91.6% and it takes ~18 min - in this context, 9 min for ksh93 sound much better ;-)

The statements enclosed in () incl. abstract function applyPatch describe the full build process I use, shown so that you are able to compare with whatever you are doing. IMHO if one makes statements about timings one should also give at least some hints, what/how one does it.

I agree, that it would be nice, if just cloning and make would trigger an efficient, out-of-the-box build, yepp . Unfortunately such SW can be rarely found. Even for GNU autotool stuff one needs to make careful preparations, so that it does what you want and not, what its broken "pseudo-AI" thinks needs to be done (e.g. like linking in libs, which should never be linked/ignoring -L .../-I... options, ...). So I've learned to accept, that SW uses different ways to build and in general all need their special kind of build preparation (and that's actually another reason why I've chosen nodejs as an example: it uses a not-yet-mentioned-here metabuild-system, but it is python based, so don't mind ;-) ) ... However, blindly changing the build-system doesn't guarantee at all, that things will change. And just cutting out parts of the things the build actually does is IMHO just symptoms surgery (may even cause more trouble) and does not really target the root cause of your pain ...

Last but not least: since you are moaning about times a re-make takes, I just touched an arbitrary file to trigger a remake and check the time it takes - thought this was obvious, sorry.

PS: Can't comment on the win stuff - I don't use this platform.

dannyweldon commented 6 years ago

Comparing building ast + ksh to building those other shells is not comparing apples with apples because building ast is almost like building a whole C library in itself. And go has modules, totally eliminating any header parsing or C compiler probing. If only att could have devised a way to retrofit modules into C using macros and library indexes somehow. But I see your point, that it is still very slow.

Therefore as an alternative, rather than changing build systems and getting rid of nmake, how about:

Move ksh93 to it's own repo (att/ksh), but make it a git submodule inside the ast repo. So if someone wants to build the whole thing including ksh, they can simply run:

git clone --recursive https://github.com/att/ast

Then we create some ast OS packages for linux, freebsd, osx, etc such as:

ast lib (libraries)
ast devel (headers)
ast nmake (nmake, probe, iffe and their definition files)
ast commands (any other commands, but not ksh)

Then make travis for the ast repo just build the ast library and packages and copy them somewhere to be uploaded as github downloads.

Then, the travis build for ast/ksh repo only has to install the ast library, headers, nmake etc. from the packages and just focus on building ksh. I'm sure that would speed things up the most, as it would skip building all of ast, which I'm sure you will still have to do if you move everything to another build system.

Then if anyone cares, they can work on improving nmake inside att/ast:

While the rest of us can simply install the ast packages on our machines and just focus on building att/ksh, which should build in under a minute from scratch, or 6 seconds after updating a file.

Unless it is possible to add CMake/meson alongside the current build system, which I doubt because I think the conditional includes inside the source may have to change, perhaps then you could just switch the ksh repo over to CMake/meson. It would still rely on the ast library, but would only need to build ksh.

Or, split the repos like I suggest above, but convert both to CMake/meson leaving the nmake and supporting code intact, but not being used by the project itself any more.

WRT moving ksh to it's own repo, any ksh related issues in the ast repo can be closed and reopened in the ksh repo with links back to each other.

ATTENTION AST GITHUB FOLLOWERS, IF YOU LIKE THIS OR ANY OTHER COMMENT, PLEASE ADD A REACTION SO THAT YOUR OPINION CAN WEIGH IN ON THE DISCUSSIONS.

If no-one is interested in doing it any of these ways, then I will concede that your splitting off the code is the way forward.

dannyweldon commented 6 years ago

@jelmd I didn't know that setting $NPROC would work on a single machine; I thought it had to work over multiple machines on a shared filesystem because it used coshell. Are you saying that it works out of the box and speeds up the build on a single machine?

jelmd commented 6 years ago

@dannyweldon: Nope, setting NPROC doesn't seem to have an effect on a single machine. As said, load average is ~25% which suggests, that the build is a plain sequential thing, which is waiting ~75% of the time to get something to do (probably blocking on I/O etc.) ... The numbers might be a little bit coarse grained, but are IMHO sufficient to get the picture ...

Wrt. nmake: don't really care, because it is not used for the ksh build at all (but mamake instead) ...

Wrt. sub-repo: -1

I'm in favor of a clean, standalone ksh repo, so that unneeded stuff can be thrown away/doesn't need to be taken over w/o having to worry about other, more or less rarely used stuff, which nobody actually wants to maintain (wrong impression?).

jhfrontz commented 6 years ago

Wrt. nmake: don't really care, because it is not used for the ksh build at all (but mamake instead) ...

That's a distinction that I was going to point out -- there seems to be some confusion (maybe mine?) about how ast software is bootstrapped/built. My recollection is that mamake is used to implement a make-like infrastructure on a system that might not have make (let alone nmake). And once some subset (ksh and nmake?) is built, then the rest of the ast software is built using nmake. See details here (search for "mamake" near the bottom): http://ast-users.research.att.narkive.com/scGZ2DZp/comments-and-information-about-nmake-and-related-ast-tools

I haven't spent a lot of time with clang so I don't know how ridiculous this question is, but -- how hard would it be to figure out how to get mamake to deal with clang's idiosyncrasies (with the idea that once that was done, then AST would be able to bootstrap itself in a clang environment)?

jelmd commented 6 years ago

@jhfrontz: at least not on solaris (where nmake is n/a), and not on Linux IIRC. If you want to dig deeper, when watching NFL, I recorded the exec calls being made when building the INIT stuff using bin/package make as well as when ksh gets actually build using bin/package make and some other stuff as well. Put the stuff into http://iks.cs.ovgu.de/~elkner/tmp/ksh93/ (NOTE: the cmd lines shown are cut @80 chars - to get out more, it would need more work ...)

krader1961 commented 6 years ago

@jelmd, you wrote

at least not on solaris (where nmake is n/a), and not on Linux IIRC.

Nmake is not available on FreeBSD or macOS either. The whole point of the Nmake based bin/package command in the AST package is to build nmake before building everything else. Also, building just ksh93 seems to result in two versions being built in so far as the arch/.../bin directory containing both a ksh and a ksh.old binary which are not identical. I don't think you actually understand how the Nmake based build system works. Neither do I but I don't claim to know how it works. I am simply pointing out that compared to more modern alternatives it is incredibly inefficient.

jelmd commented 6 years ago

Hmmm, I thought I've pointed it out already: There is no need to build nmake just to be able to build ksh, because nmake is not used when building ksh, only.

I never bothered to look into or analyze the build process, because I could live with it, however, doing some basic simple stuff when watching TV is ok for me - that's why the data http://iks.cs.ovgu.de/~elkner/tmp/ksh93/ .

IMHO, before one starts modifying the way software gets build, one should analyze/understand, how it is done right now and why, and than try to accomplish the same with the new system. Just playing kamikaze is IMHO not a good approach.