att / ast

AST - AT&T Software Technology
Eclipse Public License 1.0
557 stars 152 forks source link

Many math functions should be removed #88

Closed krader1961 closed 6 years ago

krader1961 commented 6 years ago

When I found this project and was pondering whether to contribute I wrote this blog article about the presence of math functions that have no legitimate purpose in a CLI shell.

I admit this issue is mostly driven by a desire to reduce the build time to something more reasonable and consistent with the time required to build other shells. More fundamentally why should ksh support more than one method for rounding a floating point value to an integer? Let alone support obscure libc math functions like fma() which does this according to man fma on my system:

The fma() functions compute (x*y)+z, rounded as one ternary operation: they compute the value (as if) to infinite precision and round once to the result format, according to the current rounding mode.

Supporting those types of functions is especially confusing since they often depend on concepts like NaN which make no sense in the context of a command line shell such as ksh.

I propose we eliminate any math function not relevant to a shell script.

dannyweldon commented 6 years ago

That's because ksh was never meant to be just a dumb CLI shell, but a general purpose programming language, like perl and python (you could also add php and ruby). The focus has been on programmability, hence the ability to define new types (or classes as I would prefer to call them). Whereas programmable completion basically compatible with bash was only recently added and in fact is not even in 2012-08-01 because it is so new.

I read somewhere that David Korn regretted that the take-up of ksh was not as popular as languages like perl (this was a while ago) and python. But I believe that is because it needs the syntax extended to allow more modern features, while still preserving the simple POSIX interface. (An issue on this will be forthcoming hopefully soon.)

I vaguely remember some interest from the math community in the mailing list which could be why there are some obscure math functions, or he added them for completeness because they were easy to do. A math person would be best to comment on why there are multiple methods defined to do the same thing, but they won't likely be watching the github repo.

Python has a lot of builtin math functions so I can't see why ksh cannot have them:

https://docs.python.org/3/library/math.html

siteshwar commented 6 years ago

ksh has been around for long time and there are people still using decades old script. Some of these scripts use undocumented features. While I am fine with removing such features, I am not sure about the correct process for it. Shall we deprecate it first (and define process for deprecation too) ?

qbarnes commented 6 years ago

Just to put more weight behind what dannyweldon said, I'll comment too. Please do not remove features that you happen think "have no legitimate purpose in a CLI shell". Ksh is certainly far more than just a CLI shell program.

BTW, I've been using ksh as my primary shell and shell scripting language ever since the release ksh86a (ksh860603a). And, no, I have never been an AT&T or USL employee either. Just nabbed it when I had chance and used it on AT&T SysVr3 systems running on both 3b2's boxes and ATT6300 Plus desktops with 80286 processors.

krader1961 commented 6 years ago

@qbarnes, I too am a grey beard. I started programming in 1977 (high school) and 80 column punched cards were still in common use when I went to college. I don't recall the exact version included with DYNIX/3 that I first used but it predated ksh88. Before that I was using sh for scripting and tcsh for interactive use. I was thrilled to be able to switch to ksh for both purposes and used it as my primary shell for more than two decades.

Let me start by stating that I have no objection to retaining basic trigonometric functions like cos() as well as other functions such as those for extracting the integer and fractional parts of a floating point value. But in addition to my previous examples there is no justification for bessel functions being available. These are j0, j1, y0, and y1. If you look at src/cmd/ksh93/data/math.tab and compare that to /usr/include/math.h it appears that someone simply included every math function found in a typical UNIX libm library. Again, this is just plain crazy for a command shell. No one who needs such functionality is going to be using ksh. Furthermore, you can't even assume those functions are available since their availability depends on your system libm library providing them. We would get more value by replacing those obscure libm functions with some basic statistical functions; e.g., to calculate the mean of a list of values.

Note that on my server testing whether all the functions in math.tab are available adds 59 seconds to the build time. That's approximately 10% of the total build time. Ugh!

@dannyweldon, Note that due to the quirks inherited from the Bourn shell, and subsequently codified by POSIX, ksh will never be competitive with other programming languages like Python for general purpose programming. At least not without an alternative personality that replaces a lot of the POSIX quirks (e.g., how var expansion works and the odd rules for tokenizing statements) with saner behavior. The problems with the POSIX standard is why shells like fish and elvish are explicitly not POSIX compliant.

dannyweldon commented 6 years ago

I suppose if those functions are Linux specific and not part of C99 math standard, which Python seems to follow, they can be commented out.

Ah, looks like they are XSI extensions (note the C99 functions are listed above these):

https://en.wikibooks.org/wiki/C_Programming/math.h#XSI_Extensions

However, it seems that dgk would sometimes include undocumented features as a way of silently alpha testing them, so I don't have a problem with them staying, but I can understand your pain wrt build time, but just removing the bessel functions won't save much time.

I just tested on my local machine running nmake in the ksh93 source directory when running from within the ast environment (bin/package use), the first build takes a little while (46 seconds) especially with all the math iffe tests, but on subsequent runs takes only about 6 seconds as it skips all those tests.

qbarnes commented 6 years ago

Again, this is just plain crazy for a command shell.

Think of ksh as a scripting language first that also happens to have a command shell interface too.

No one who needs such functionality is going to be using ksh.

So you're telling me you've never come up with a creative way to use a piece of software that its author didn't think of?

Just because you can't imagine a way a feature is useful, there are countless numbers of users out that can and will surprise you.

qbarnes commented 6 years ago

I suppose if those functions are Linux specific and not part of C99 math standard ...

They are not Linux specific. The bessel functions are part of the SVID, BSD, and Xopen standards.

krader1961 commented 6 years ago

So you're telling me you've never come up with a creative way to use a piece of software that its author didn't think of?

No, I'm not saying that. :smile: I've seen an assembler (i.e., an assembly language "compiler") written as a bunch of Bourne shell scripts. I've seen a MS-DOS emulator written in Javascript. That does not mean those are the appropriate tools for those jobs. And ksh is not the right tool to do things like compute bessel and gamma functions. Similarly, consider the definition of the erfc() function (from GNU:

erfc returns 1.0 - erf(x), but computed in a fashion that avoids round-off error when x is large.

If you're doing calculations where rounding errors are a concern such that you need to use erfc() you're not going to be relying on the opaque behavior of ksh where you don't have control over such matters. You'll also want access to things like the fesetenv() function.

If someone at work tried to commit a ksh script that used these functions I would be concerned about their engineering judgement.

Having said all that I wouldn't particularly care if their inclusion didn't slow down the build by 10%. One way to speed this up is to note that the build process can safely assume that sin() is available if cos() is available. Similarly, if j0() is available then all the other bessel functions are available. It doesn't need to test each one individually. Etcetera.

krader1961 commented 6 years ago

Also, note that the percentage of build time devoted to figuring out which functions in src/cmd/ksh93/data/math.tab are usable is closer to 15% when building with the minimum set of source dirs from this comment about changing the build tool chain.

People who only use, rather than build, ksh won't care about this issue. But for the people interested in trying to keep ksh alive this matters a lot. I care that any time I make a change it takes tens of minutes to validate my change doesn't break anything.

dannyweldon commented 6 years ago

Actually they are documented in the 2012-08-01 man page:

   Any of the following math library functions that are in the C math library can be used within an arithmetic expression:

   abs acos acosh asin asinh atan atan2 atanh cbrt ceil copysign cos cosh erf erfc exp exp2 expm1 fabs fpclassify fdim finite floor fma fmax fmin fmod hypot
   ilogb int isfinite sinf isnan isnormal issubnormal issubordered iszero j0 j1 jn lgamma log log10 log2 logb nearbyint nextafter nexttoward  pow  remainder
   rint  round  scanb signbit sin sinh sqrt tan tanh tgamma trunc y0 y1 yn In addition, arithmetic functions can be define as shell functions with a variant
   of the function name syntax,

   function .sh.math.name ident ... { list ;}
          where name is the function name used in the arithmetic expression and each identifier, ident is a name reference  to  the  long  double  precision
          floating point argument.  The value of .sh.value when the function returns is the value of this function.  User defined functions can take up to 3
          arguments and override C math library functions.
jelmd commented 6 years ago

Math functions were one reason, when I decided to use ksh93 as "my" scripting shell.They are useful and make script more efficient by dropping the need to fork external binaries (which might be not even available on certain platforms) or to do the usual bloat to find out, which options the external binaries support to get the job done.

I actually do not care about the add. compile time that much, because I do not compile ksh93 each time I start a script - actually I rely on the platform vendors, what they provide ;-). However, optimizing the build process is still an option. Optimizing by killing useful features is IMHO a NO GO.

krader1961 commented 6 years ago

@jelmd, What math functions are you using? I definitely agree that a lot of the existing functions should be retained. It shouldn't be necessary to exec an external command like bc or calc just to calculate a logarithm. So functions like abs(), trunc(), and log() should definitely be available. What I don't understand is why functions like, errfc(), the bessel functions (j0(), etc.) and gamma (tgamma()) are included.

qbarnes commented 6 years ago

@krader1961 Why do you keep trying to second guess what people may or may not want by polling less than 0.0001% of ksh users here on git. Obviously, people here have already spoken out loudly that the math functions already in ksh are wanted and useful. You don't go around removing already documented and supported features. Leave it alone and move on to other work.

krader1961 commented 6 years ago

people here have already spoken out loudly that the math functions already in ksh are wanted and useful.

No, they haven't, @qbarnes. Which is to say absolutely no one has said anything other than "someone, somewhere, might want to use those functions." But you're right. I'm obviously not going to change the minds of people who have made that assertion so I'm going to stop tilting against this windmill. I'd still love for someone to provide an example where any of the obscure functions I've asked about are actually being used in a Korn shell script written before I opened this issue. I'm betting that no matter how long I wait and how many places I ask that question the response will be crickets.

It appears that everyone who has provided an opinion on the matter simply assume those functions will be available. And every OS we likely care about today provide them. So the simplest solution is to turn the generated src/cmd/ksh93/FEATURE/math file into a C module in the main source tree and bypass all the silly, unbelievably inefficient, probing that goes into generating that module. Thereby making a noticeable, not merely measurable, improvement to the build time.

krader1961 commented 6 years ago

Here I am tilting at this windmill, again, because I an truly curious why my proposal has generated such rancor.

Optimizing by killing useful features is IMHO a NO GO.

@jelmd, What useful feature do you think I am proposing be removed? Are you actually aware of anyone using, for example, the j0() math function in a Korn shell script?

There were some developers in the Fish shell project who argued that the shell should only support basic arithmetic operations when I decided to make its math command a builtin (rather than a wrapper around the bc external command). I disagreed with them. I agree that it should be possible to do things like cheaply compute log2() or log10() in a ksh script.

What I don't understand is why every function in a typical UNIX libm library is exposed given that roughly one-third to one-half of them make zero sense in a language of this nature. It appears those libm functions were exposed just because it was trivial to do so. Not because those functions are needed or would ever be used.

@dannyweldon provided a link to the Python math module in an earlier comment. Python is my favorite language. However, note that even it does not expose all the libm functions (e.g., isgreaterequal()). It also exposes a few useful functions, like math.modf(x) that ksh does not. It also provides several, non libm based functions, that are useful such as math.gcd(a, b). So when I hear the argument that ksh should be a general purpose language like Python in this issue I am perplexed.

kdudka commented 6 years ago

It does not really matter whether it was a good idea to introduce those functions originally. But they have been available for a long time and there may exist scripts that rely on them. The fact that we have no statistics about their usage is not a sufficient reason to immediately remove them.

If we really agreed that those functions should go away (and so far there seems to be no agreement on that), we should go through a deprecation process. We should announce that those functions will be removed in X.Y version of ksh, mention the fact in the documentation, then start issuing deprecation warnings on invocation of those functions, remove them from the documentation and eventually remove their implementation as the very last step.

jelmd commented 6 years ago

@krader1961: I think it has been cited already:

Any of the following math library functions that are in the C math library can be used within an arithmetic expression ...

I'm pretty sure, I've used it somewhere in my ~20 year ksh93 scripting history. But that doesn't matter at all. What matters is, that you can't break interfaces just because it saves you some seconds of compile time.

And as someone already said: ksh93 is a scripting language like e.g. javascript, whitespaces, etc. - there is no reason for cutting down its capabilities, just because other limited stuff doesn't have it. However, I've no problem with extending it, e.g. adding modf etc., when usually available in libm and requested by ksh users.

I guess, that your POV is limited to see ksh93 just as an interpreter of ancient bourne shell scripts. If that would be really the target of ksh93, IMHO it would not make sense to waste any time for it - there are dozens of others shells, which can do this.

kshji commented 6 years ago

My opinion: Don't remove or change any syntax. You can add command or options or ..., but not remove or change syntax. Ksh93 is not only CLI for me, it's more RAD.

I have done almost everything using ksh93 last 20 years: tcp/ip servers+clients, db processing, http-server process, msg-servers, event server, ... If there is something which I can't do with ksh93, then I use some other tool. Ksh93 is my RAD. Why ? I don't need change my scripts even I install updated ksh93. For me ksh93 include all (99%) what I need in my programs: socket, named pipe, events, ... You never know where they already use ksh93. Why to kill those scripts ? ksh93 include full support for older shells: ksh88 and bourne shell. Even ksh93 offer some better method to some older builtin methods, ksh93 has not remove support for history. Thanks.

Only some of problems I have need other tools like C, php, perl, .... Ksh93+awk+ html5 is my env. I have used and seen too many dev systems which change the syntax - why ? Speed of building can't be the main reason/problem.

krader1961 commented 6 years ago

@kshji, I know this issue has gotten rather lengthy but I have no idea how you got the idea that any change to syntax was being proposed. No syntax is being changed. So don't lose any sleep :smile:

The proposal was to stop testing if the OS provides a few really obscure math functions (like errfc()) that I'm willing to bet money no one has ever used (other than possibly to see what happens if you call it). But rather than do that I've decided to take a different approach that avoids the hideous expense of testing the availability of each function one at a time. In fact, I'm just about to create a PR with the improvement to the build process that shaves a full minute off the build time.

krader1961 commented 6 years ago

Closing since I've merged a change that should remove the 10-15% of the time spent probing for each math function individually.