elves / elvish

Powerful scripting language & versatile interactive shell
https://elv.sh/
BSD 2-Clause "Simplified" License
5.53k stars 297 forks source link

Use int64 rather than int for public integers #1731

Closed krader1961 closed 2 months ago

krader1961 commented 8 months ago

There was a recent thread opened by someone who was surprised that randint failed with a "cannot parse as integer" exception. It turned out they were using a 32-bit version of Elvish. A 64-bit version of Elvish handled the situation. It seems to me that Elvish should uniformly use int64 in its public APIs (e.g., the randint command) where that is possible. That an int64 may be slower than an int on 32-bit platforms is less important than consistent behavior by Elvish on platforms with different native integer width.

The specific example was this: randint 100000000000 1000000000000000. But there are plenty of other builtins that use int where int64 is preferable. Maximal speed on 32-bit platforms is less important than consistent behavior on platforms with 32 and 64 bit word length.

krader1961 commented 6 months ago

I've been chipping away at a change to resolve this issue. It is proving to be quite intrusive. Consider something like the implementation of the count command:

func count(fm *Frame, args ...any) (int, error) {
        var n int
        switch nargs := len(args); nargs {
        case 0:
                // Count inputs.
                fm.IterateInputs(func(any) {
                        n++
                })
        case 1:
                // Get length of argument.
                v := args[0]
                if len := vals.Len(v); len >= 0 {
                        n = len
                } else {
                        err := vals.Iterate(v, func(any) bool {
                                n++
                                return true
                        })
                        if err != nil {
                                return 0, fmt.Errorf("cannot get length of a %s", vals.Kind(v))
                        }
                }
        default:
                // The error matches what would be returned if the `Inputs` API was
                // used. See GoFn.Call().
                return 0, errs.ArityMismatch{What: "arguments", ValidLow: 0, ValidHigh: 1, Actual: nargs}
        }
        return n, nil
}

All functions of this sort need to be changed to use an explicit int64 type for the return value.

Note that unit tests which do something like That("num 1").Puts(1) are easy enough to handle by having the unit test framework handle the conversion so that the .Puts(1) doesn't have to be written as .Puts(int64(1)) since num 1 now produces a int64 rather than a int type. But that enhancement to the unit test framework only handles some of the int/int64 type differences as exhibited by the count implementation above.

krader1961 commented 6 months ago

My main concern is not whether this change should be done. The external behavior of Elvish on 32 and 64 bit platforms should be identical. The question is how to minimize the size of the necessary changes, possibly by breaking them up over several commits.

krader1961 commented 6 months ago

The user @0323pin on the DM channels asked:

Even though ... I don't know for how long 32bit systems will remain relevant but, that's for another discussion.

To which I replied:

They're likely to be relevant for a long time. For the same reasons even older 8/16 bit word length systems are still somewhat relevant. Obviously that diminished relevance doesn't require Elvish to support those systems, but Elvish should support 32 bit systems for the foreseeable future. In particular I think Elvish should exhibit the same behavior on 32 and 64 bits systems to the extent possible. This isn't entirely possible given the Go language specification. Consider that the maximum length of a slice/array is constrained by the range of the native int type. So Elvish is going to have a lower limit on the length of a list on 32 bit platforms. Which affects the behavior of the Elvish count command. Nonetheless, it should be possible for Elvish to otherwise treat numbers in the range of 64 bit ints the same on platforms with a 32 bit word length. Which affects commands like randint and %.

Note that this works on 64 bit platforms but not 32 bit platforms:

elvish> % (math:pow 2 31) 3
▶ (num 2)

But not 32 bit platforms:

elvish> % (math:pow 2 31) 3
Exception: wrong type for arg #0: must be integer
[tty 14]:1:1: % (math:pow 2 31) 3

I can't see a good reason why the modulo command should behave differently based on whether the platform uses 32 or 64 bit integers.

xiaq commented 2 months ago

Ideally, the use of int shouldn't matter for Elvish. Go functions that take int as an argument can be divided into two classes:

  1. Where INT_MAX is a genuine upper bound on the argument. For this class it makes sense to retain the machine-dependent INT_MAX as an upper bound in the Elvish binding.

    An example of that is str:replace's &max option: because the size of a string can't exceed INT_MAX, specifying a &max option that exceeds INT_MAX makes no sense either.

  2. Where it's an accidental limitation of the Go function's implementation. For this class the ideal solution is to extend the domain to accept big.Int as well.

    The examples of randint and % both fall into this class. Both these functions can be extended to accept big ints fairly straightforwardly since big.Int provides .Rand and .Rem that can be used to implement them respectively.

Now, it's possible that it's much easier to extend some functions in class 2 to accept int64 than it is to extend them to accept big.Int - maybe Go's stdlib provides implementations for int64 but not big.Int. But unless there are a lot of those examples (and there is currently none), I am reluctant to change the internal representation of integers. Defaulting to int is a pretty strong convention in Go and going against this default can create some headache.

krader1961 commented 2 months ago

I am reluctant to change the internal representation of integers. Defaulting to int is a pretty strong convention in Go and going against this default can create some headache.

Indeed, that is the source of much of the difficulty I've encountered in trying to switch from int to int64 for any num that can be represented by int64 on 32-bit platforms. It appears that special-casing commands that operate on integers, such as randint, to also handle big.Int is going to be simpler than replacing int with int64 as the base integer for the num type. Too, that approach is also more flexible by providing a larger range than int64 for commands where restricting the range to int64 isn't necessary.

I tabled working on my original approach a couple of months ago. In the hope that revisiting the code after putting it on the shelf for a few weeks would yield insights on how to simplify the change. But that seems unlikely after taking another look at my work-in-progress.

Now, it's possible that it's much easier to extend some functions in class 2 to accept int64 than it is to extend them to accept big.Int

My current work-in-progress surfaced only a handful of cases where that is true. It looks like it is easier to modify those cases to explicitly convert big.Int to int64, if the value is in the int64 range and otherwise raise an error, than it is to have those cases take an int64 argument and have the Elvish runtime do the conversion.