Proposal: Go 2: add chained interval comparisons

jfcg commented 5 years ago

Python comparisons can be chained like:

if x < y <= z:
    ...

This means:

x is evaluated
y is evaluated
if x < y then z is evaluated, else skip master if body
if y <= z then flow proceeds into master if body

This is intuitive and easier to read. This proposal covers monotone relations only (<,<=,>,>= in one direction) like:

x < y <= z
a > b > c

It does not cover any non-monotone relations like:

p >= q < r
q < w != e
a != s == d

In order to determine places where such chained interval comparisons could be used in Go, we can use an (improved) bash script like:

# Finds 3 component monotone comparisons
name='( [a-zA-Z_][a-zA-Z_0-9.*/+%()-]* )'
nmrf='(?:\1|\2)'
less='<=?'
more='>=?'
keyw='(?:if|for|case).*'
logi='[^|&]*(?:&&|\|\|)[^|&]*'

patl=(
"$keyw(?:$less$name|$name$more)$logi(?:$nmrf$less|$more$nmrf)"

"$keyw(?:$more$name|$name$less)$logi(?:$nmrf$more|$less$nmrf)"
)
s=0
for pat in "${patl[@]}"; do
    r=$(grep -Pr --include '*.go' "$pat" . | grep -c .)
    let s+=r
    grep -Prm 1 --include '*.go' "$pat" . | head -n 3
done
echo "$s+ cases"

On some popular projects developed in Go, we get the following examples & totals:

examples from dgraph:

./chunker/json_parser.go:       if buf.batchSize > 0 && len(buf.nquads) >= buf.batchSize {
./chunker/rdf_state.go: case r >= 'a' && r <= 'z':
./gql/parser.go:        if depth > 4 || depth < 0 {
./compose/compose.go:   if opts.NumZeros < 1 || opts.NumZeros > 99 {
./dgraph/cmd/zero/tablet.go:            if tab.Space <= sizeDiff/2 && tab.Space > size {
./lex/lexer.go: if p.idx < 0 || p.idx >= len(p.l.items) {

448+ cases

examples from etcd:

./client/client_test.go:    if ratio := float64(pinNum) / float64(round); ratio > max || ratio < min {
./clientv3/client.go:       if cfg.MaxCallRecvMsgSize > 0 && cfg.MaxCallSendMsgSize > cfg.MaxCallRecvMsgSize {
./functional/runner/global.go:          for rc.progress < rounds || rounds <= 0 {
./auth/store.go:    if bcryptCost < bcrypt.MinCost || bcryptCost > bcrypt.MaxCost {
./integration/v3_lease_test.go: if ttlresp.TTL < expectedTTL-1 || ttlresp.TTL > expectedTTL {
./mvcc/kvstore_txn.go:  if limit <= 0 || limit > len(revpairs) {

276+ cases

examples from frp:

./vendor/github.com/fatedier/beego/logs/logger.go:  case code >= 200 && code < 300:
./vendor/github.com/fatedier/kcp-go/kcp.go: if n >= int(kcp.mtu-IKCP_OVERHEAD) || n < 0 {
./vendor/github.com/gorilla/websocket/conn.go:      if c.readLimit > 0 && c.readLength > c.readLimit {
./vendor/github.com/golang/snappy/decode_other.go:      if offset <= 0 || d < offset || length > len(dst)-d {
./vendor/github.com/gorilla/websocket/x_net_proxy.go:   if port < 1 || port > 0xffff {
./vendor/github.com/hashicorp/yamux/session.go:     if mt < typeData || mt > typeGoAway {

219+ cases

examples from gitea:

./models/git_diff.go:           if begin <= line && end >= line {
./modules/auth/auth.go: if token.ExpiresAt < time.Now().Unix() || token.IssuedAt > time.Now().Unix() {
./modules/git/signature.go:     if firstChar >= 48 && firstChar <= 57 {
./models/action.go:     if slashIndex < 0 || slashIndex >= poundIndex {
./models/repo_collaboration.go: if mode <= AccessModeNone || mode > AccessModeOwner {
./modules/indexer/repo.go:          if startIndex < 0 || locationStart < startIndex {

590+ cases

examples from go:

./misc/cgo/testshared/shared_test.go:           if prog.Off <= offset && offset < prog.Off+prog.Filesz {
./src/archive/tar/format.go:        if 148 <= i && i < 156 {
./src/bufio/bufio.go:       if n > 0 && n < b.n {
./misc/cgo/gmp/gmp.go:  if base < 2 || base > 36 {
./src/archive/tar/strconv.go:   if perr != nil || n < 5 || int64(len(s)) < n {
./src/archive/zip/reader.go:    if o := int64(d.directoryOffset); o < 0 || o >= size {

1369+ cases

examples from influxdb:

./bolt/bucket.go:       if limit > 0 && len(bs) >= limit {
./bolt/dashboard.go:        if limit > 0 && len(ds) >= limit {
./chronograf/oauth2/cookies.go: if lifespan > 0 && inactivity > lifespan {
./chronograf/influx/queries/select.go:      if got := len(v.Args); got < 1 || got > 2 {
./chronograf/oauth2/mux_test.go:    if resp.StatusCode < 300 || resp.StatusCode >= 400 {
./cmd/influx/task.go:   if taskFindFlags.limit < 1 || taskFindFlags.limit > platform.TaskMaxPageSize {

53+ cases

examples from kubernetes:

./cmd/kubeadm/app/preflight/checks.go:          if r != nil && r.StatusCode >= 500 && r.StatusCode <= 599 {
./cmd/kubeadm/app/util/endpoint.go: if err == nil && (1 <= portInt && portInt <= 65535) {
./cmd/kubeadm/app/util/system/package_validator.go:         case c >= '0' && c <= '9':
./cmd/kube-apiserver/app/options/validation.go: if options.KubernetesServiceNodePort < 0 || options.KubernetesServiceNodePort > 65535 {
./cmd/kube-scheduler/app/options/insecure_serving.go:   if o.BindPort < 0 || o.BindPort > 65535 {
./pkg/apis/core/validation/validation.go:   if iscsi.Lun < 0 || iscsi.Lun > 255 {

1524+ cases

examples from moby:

./client/request.go:    if serverResp.statusCode >= 200 && serverResp.statusCode < 400 {
./daemon/daemon_unix.go:    if resources.Memory > 0 && resources.MemorySwap > 0 && resources.MemorySwap < resources.Memory {
./daemon/events/events.go:      if untilNanoUnix > 0 && ev.TimeNano > untilNanoUnix {
./builder/dockerfile/evaluator.go:  if i < 0 || i > len(r.flat) {
./builder/remotecontext/remote.go:  if plen <= 0 || plen > maxPreambleLength {
./daemon/cluster/listen_addr.go:    if portNum < 1024 || portNum > 49151 {

461+ cases

examples from nomad:

./client/fs_endpoint.go:    if limit > 0 && limit < streamFrameSize {
./command/agent/alloc_endpoint.go:  if len(tokens) > 2 || len(tokens) < 1 {
./command/agent/retry_join.go:      if serverJoin.RetryMaxAttempts > 0 && attempt > serverJoin.RetryMaxAttempts {
./api/internal/testutil/freeport/freeport.go:       if port < firstPort+1 || port >= firstPort+blockSize {
./client/stats/cpu_test.go: if percent < expectedPercent && percent > (expectedPercent+1.00) {
./command/agent/config.go:  if 0 > port || port > 65535 {

769+ cases

examples from prometheus:

./documentation/examples/remote_storage/remote_storage_adapter/opentsdb/tagvalue.go:            case b >= '0' && b <= '9':
./pkg/textparse/openmetricslex.l.go:    case c == ':' || c >= 'A' && c <= 'Z' || c == '_' || c >= 'a' && c <= 'z':
./pkg/textparse/promlex.l.go:   case c == ':' || c >= 'A' && c <= 'Z' || c == '_' || c >= 'a' && c <= 'z':
./cmd/prometheus/main.go:                   if cfg.tsdb.WALSegmentSize < 10*1024*1024 || cfg.tsdb.WALSegmentSize > 256*1024*1024 {
./pkg/textparse/openmetricslex.l.go:    case c >= '0' && c <= ':' || c >= 'A' && c <= 'Z' || c == '_' || c >= 'a' && c <= 'z':
./pkg/textparse/promlex.l.go:   case c >= '0' && c <= ':' || c >= 'A' && c <= 'Z' || c == '_' || c >= 'a' && c <= 'z':

550+ cases

examples from terraform:

./config/interpolate_funcs.go:              if i >= from && i < to {
./configs/variabletypehint_string.go:   case 76 <= i && i <= 77:
./helper/resource/state.go:         if conf.PollInterval > 0 && conf.PollInterval < 180*time.Second {
./config/resource_mode_string.go:   if i < 0 || i >= ResourceMode(len(_ResourceMode_index)-1) {
./configs/configschema/internal_validate.go:            case blockS.MinItems < 0 || blockS.MinItems > 1:
./configs/configschema/nestingmode_string.go:   if i < 0 || i >= NestingMode(len(_NestingMode_index)-1) {

653+ cases

As seen from 6912+ cases above, many thousands of if / case / for clauses doing interval checks can be made simpler and easier to read. Also, it has advantages for IEEE floats, see below. What do you think?

robpike commented 5 years ago

If people would just learn to write them like this:

if 'a' <= c && c <= 'z' ...

or, for exclusion:

if c < 'a' || 'z' < c

the benefits of this suggestion would be very small.

ianlancetaylor commented 5 years ago

I believe the rule here would be that a cmpop1 b cmpop2 c is syntactic sugar for tmp := b; a cmpop1 tmp && tmp cmpop2 c . Here the cmpop operators may be any of <, <=, >=, >.

deanveloper commented 5 years ago

@robpike While that looks enticing, keeping the variable on the left side is often the most common thing to do, and switching between having it on the left and right, especially in a single expression, makes me second/third guess myself on my "greater than"s and my "less than"s.

I still am unsure if I like this proposal, while I've definitely thought "man, a language where you could chain comparisons would be cool" (I did not know python did this), I don't think it quite fits into Go.

I can say that when chained even further (not sure this is really a use case), it definitely is easier to understand:

x := foo() < bar() > baz() < buzz()

that may be hard to understand, but this is harder in my eyes:

var x bool
barVal := bar()
if foo() < barVal {
    bazVal := baz()
    if barVal > bazVal {
        if bazVal < buzz() {
            x = true
        }
    }
}

Granted, complexity should grow vertically, not horizontally. I also could have possibly made it a bit clearer if I had done something like if foo() >= barVal { break block } which would get rid of the "indentation hell" problem.

Either way, it seems like a nifty feature, but it's not a need-to-have

jfcg commented 5 years ago

I think it is best to allow chaining for monotone relations. x < y >= z is not what it looks like. It could be misread as an interval check. Interval checks would become really comfortable to read with chaining. Anything beyond that does not fit Go well, I think.

jfcg commented 5 years ago

There is another advantage related to IEEE floats. Think of a typical function checking its parameters:

func Myfun1(x float64) {
    if x < L1 || x >= L2 { // check for [L1, L2)
        // report bad param
        return
    }
    ...
}

This kind of early return is good practice & very common. Now check this out:

package main

import (
    "fmt"
    "math"
)

var L1, L2 = 1.2, 4.5

var cand = []float64{
    math.Inf(-1), L1 - .7, L1, (L1 + L2) / 2,
    L2, L2 + 1, math.Inf(1), math.NaN()}

func main() {
    // testing [L1, L2)
    fmt.Println("Testing for [", L1, ",", L2, ")\n")

    for _, v := range cand {
        if v < L1 || v >= L2 {
            fmt.Println(v, "\tout")
        } else {
            fmt.Println(v, "\tin")
        }
    }
}

This is the output:

Testing for [ 1.2 , 4.5 )

-Inf    out
0.5     out
1.2     in
2.85    in
4.5     out
5.5     out
+Inf    out
NaN     in

NaN, the notorious design flaw in IEEE 754, passes the test O_o What we should have written is:

        if !(L1 <= v && v < L2) {

which fixes the test:

Testing for [ 1.2 , 4.5 )

-Inf    out
0.5     out
1.2     in
2.85    in
4.5     out
5.5     out
+Inf    out
NaN     out

Logically those are the same tests, but NaN breaks mathematical logic. This is why (among other reasons) it is a notorious design flaw! With chained interval checks, we comfortably write:

        if ! L1 <= v < L2 {

and everything works ;)

ianlancetaylor commented 5 years ago

I'm somewhat concerned about the short-circuiting behavior. In an expression like f1() < x < f2() we will see that f1 is always called but f2 is only sometimes called. I think that is potentially confusing.

jfcg commented 5 years ago

It is similar to g1() && g2(). Any programmer learns that g1() will and g2() may be called. If the programmer definitely needs to call f2, she can always write f2() > x > f1(). So I dont think that's any different.

ianlancetaylor commented 5 years ago

It's different because && and || always short-circuit. < and friends only short-circuit in a specific case.

jfcg commented 5 years ago

Sorry, mobile phone. Ian, can you explain what you mean with an example?

ianlancetaylor commented 5 years ago

When I write the expression f1() && f2() I know that f2 will only be called if f1() returns true. This is true no matter where the expression occurs.

When I write the expression f1() || f2() I know that f2 will only be called if f1() returns false. This is true no matter where the expression occurs.

When I write the expression f1() < f2() I know that both f1 and f2 will always be called. Unless I happen to write v < f1() < f2(), in which case that is no longer true. In v < f1() < f2() f1 is always called but f2 is only called if v < f1() is true.

My point is simply that && and || have consistent behavior with regard to short-circuiting, regardless of what is around them. In this proposal as I understand it, that is not true for < and friends. They short-circuit depending on the context in which the expression appears.

jfcg commented 5 years ago

With f1() && f2() f1 will and f2 may be called. With g1() || f1() && f2() g1 will and f1, f2 may be called. If you put something in front of an expression, it changes execution possibilities. I don't see this as an inconsistency but a feature of the language. So I disagree that people would be confused with "f2 may be called" reality.

Also x < f1() < f2() is just a convenient rewrite of x < f1() && f1() < f2() except you call f1 once. If you want to call f1 possibly twice, you use the latter. It is still the && operator that does short-circuit.

jfcg commented 5 years ago

There is also the issue of number of components allowed for chaining. These are the possibilities:

A) Just 3 components like x < y <= z

B) Up to 4 components like x < y <= z < q We can also use a similar bash script to identify these:

# Finds 4 component monotone comparisons
name='( [a-zA-Z_][a-zA-Z_0-9.*/+%()-]* )'
nmrf='(?:\1|\2)'
n2rf='(?:\3|\4)'
less='<=?'
more='>=?'
keyw='(?:if|for|case).*'
logi='[^|&]*(?:&&|\|\|)[^|&]*'

patl=(
"$keyw(?:$less$name|$name$more)$logi(?:$nmrf$less$name|$name$more$nmrf)$logi(?:$n2rf$less|$more$n2rf)"

"$keyw(?:$more$name|$name$less)$logi(?:$nmrf$more$name|$name$less$nmrf)$logi(?:$n2rf$more|$less$n2rf)"
)
s=0
for pat in "${patl[@]}"; do
    r=$(grep -Pr --include '*.go' "$pat" . | grep -c .)
    let s+=r
    grep -Prm 1 --include '*.go' "$pat" . | head -n 3
done
echo "$s+ cases"

These are much rarer as expected:

examples from go:

./src/unicode/utf16/utf16.go:   if surr1 <= r1 && r1 < surr2 && surr2 <= r2 && r2 < surr3 {
./src/reflect/value.go:     if i < 0 || j < i || j > s.Len {
./test/slice3.go:                   if iv > jv || jv > kv || kv > Cap || iv < 0 || jv < 0 || kv < 0 {

5+ cases

examples from kubernetes:

./vendor/gonum.org/v1/gonum/mat/vector.go:  if i < 0 || k <= i || v.Cap() < k {

1+ cases

Only 6+ cases in eleven Go code bases that I've checked. Possibly a couple dozens in all Go code publicly available.

C) 5+ components. I saw one in Go, first example above.

If this proposal would be accepted, my vote is for B because:

x < y <= z < q <= r can be written as x < y <= z && z < q <= r
x < y <= z < q <= r < t can be written as x < y <= z && z < q <= r < t etc.

5+ component comparisons can still benefit a lot from chaining. I think this is the right balance. So what do you think?

ianlancetaylor commented 5 years ago

It's also worth noting that a < b < c is very different from a < b != c. The latter is valid Go today, comparing two boolean values.

jfcg commented 5 years ago

In the first case c must of an ordered type, in the second c must be bool. So say if you mistyped a <= as !=, it won't compile. If c is a bool, the second still stands valid with monotone chaining, right?

Do you mean the compiler gets some extra difficulty distinguishing these cases?

griesemer commented 5 years ago

a < b < c means a < b && b < c but a < b != c doesn't mean a < b && b != c which is confusing at least - the meaning of the latter we cannot change for backward-compatibility.

Also, currently, the comparison operators simply follow the rules for other binary operators, so we'd have to introduce an irregularity there.

it's easy to apply De Morgan's laws with the current definition. If we allow a < b < c, the negation would be a >= b || b >= c which is not the same at all as a >= b >= c - another source of confusion. Applying De Morgan is a common operation when restructuring conditional code. Doing it in one's head is also a common operation when thinking about invariants of loops, etc.

I am not convinced this form of syntactic sugar - as appealing as it looks - is worth the cost.

jfcg commented 5 years ago

Hi Robert, With monotone comparisons, I mean the ones involving <, <=, >, >= in one direction only. For example:

x < y <= z
a > b > c > d

They are called monotone (non-)increasing / (non)-decreasing sequences of numbers. This is standard math terminology. So the following are not, for example, monotone relations:

a < b != c
x != y == z

I am updating the main proposal above to be more clear about monotonicity, and then I will have a follow up about negations.

jfcg commented 5 years ago

Hi again Robert,

First, this proposal does not in any way intend to manipulate the fundamental laws of math, like DeMorgan's (they are set in stone but maybe we could propose something about QM ;P that is for another day). In fact, just like you said, people very often utilize DeMorgan laws to rightly manipulate their expressions. Negation is very common:

func Myfun1(x float64) {
    if x < L1 || x >= L2 { // check for [L1, L2)
        // report bad param
        return
    }
    ...
}

This is a very typical example of early return. It is also good practice. Here the programmer wants to accept only finite x from [L1, L2). As I have written in detail above on IEEE floats, there is a very serious and infectious bug in this small piece of parameter validation. The notorious design flaw in IEEE 754, NaN, passes this test.

It is not the programmer's fault (it is an archaic hw design flaw), but her responsibility, to take great care about this, unfortunately. What she should have written is:

        if !(L1 <= x && x < L2) {

which fixes the test. What we could all enjoy writing instead is:

        if ! L1 <= x < L2 {

There are two things here. The first one is monotone chaining which we already talked about. For the second see this:

var x, y float64 // or int
...
if ! x < y {     // not valid
    ...
}

This does not compile because ! binds stronger than <. This looks unnecessary at first. Why don't you just write x >= y, right? But for IEEE floats, you have to write !(x < y) if what you really mean is x >= y. So in order to fully utilize monotone chaining, we need to adjust relative priority of ! and < <= > >= as well, honestly.

Here are two examples from Go itself:

./src/unicode/utf16/utf16.go:   if surr1 <= r1 && r1 < surr2 && surr2 <= r2 && r2 < surr3 {
./src/reflect/value.go:     if i < 0 || j < i || j > s.Len {

Here is how we could write them:

./src/unicode/utf16/utf16.go:   if surr1 <= r1 < surr2 <= r2 < surr3 {
./src/reflect/value.go:     if ! 0 <= i <= j <= s.Len {

Personally I find the latter much more readable and clear. It expresses your intent much better. I rest my case ;)

griesemer commented 5 years ago

@jfcg I didn't mean to imply that your suggestion manipulates fundamental laws or math. What I said is that predicate negations using DeMorgan's rules become more complex and somewhat unintuitive.

Regarding your example with NaNs: I don't think that is a convincing example. NaNs are a design flaw (I'd agree with that wholeheartedly), and I suspect almost no numeric code is correct in the presence of NaNs. It's best to avoid them.

jfcg commented 5 years ago

Quick question: if we adjust relative priority of ! and < <= > >=, do we break any existing compiling code?

ianlancetaylor commented 5 years ago

Quick question: if we adjust relative priority of ! and < <= > >=, do we break any existing compiling code?

Yes. That would be a silent change in the behavior of existing code. We can't do that.

jfcg commented 5 years ago

Yes, but I meant is there a specific case? Could you give an example?

ianlancetaylor commented 5 years ago

Oh, sorry, now I see what you mean. I think you're right: I can't think of any way to use ! with a comparison operator directly today. I think we could in principle permit ! a < b to mean ! (a < b). Although it would mess up the grammar quite a bit.

jfcg commented 5 years ago

Just brainstorming :P Operator precedence in Go is:

unary operators
*  /  %  <<  >>  &  &^
+  -  |  ^
==  !=  <  <=  >  >=
&&
||

! operator can act on bool values only. It cannot interact with arithmetic and bitwise operators directly. Only last 3 lines of operators above output bool variables. Also comparison operators are actually two classes: != == are applicable to all types < <= > >= are applicable to only ordered types

What do you think of the following operator precedence?

unary operators except !
*  /  %  <<  >>  &  &^
+  -  |  ^
<  <=  >  >=
!
==  !=
&&
||

What I am curious about is if this could be non-breaking for existing compiling Go code. If not what is an example that this precedence breaks? I could not come up with one. Could we even push ! below == != ?

ianlancetaylor commented 5 years ago

We've discussed this quite a bit, and it seems to us that this idea, while sometimes convenient, doesn't seem to meet the "importance" criteria of a language change ("address an important issue for many people"). We're also concerned that the novel short-circuiting behavior isn't a good fit with Go. We don't want to change the operator precedence levels, which are (we hope) simple enough to remember with only five levels (and changing them would likely not be backward compatible).

For these reasons, this is a likely decline. Leaving open for a month for final comments.

latitov commented 5 years ago

I agree with @robpike that

If people would just learn to write them like this: if 'a' <= c && c <= 'z' ...

But there's another situation, somewhat related, but different. Don't know if anybody had problem with it, but I did. Here it is:

if Somewhere.SomeVeryLongVariable == 1 || Somewhere.SomeVeryLongVariable == 4 || Somewhere.SomeVeryLongVariable == 9 || Somewhere.SomeVeryLongVariable == 25 {
   ...
}

If I were designing a language on my own, I probably wouldn't bother to add "chained intervals", but I definitely would add this:

if Somewhere.SomeVeryLongVariable == 1 || == 4 || == 9 || == 25 {
   ...
}

And having "or" instead of or a synonym to "||" would also help.

griesemer commented 5 years ago

@latitov Your example - as you say - is unrelated to this issue. That said, the specific example you're giving would probably be written as

switch Somewhere.SomeVeryLongVariable {
case 1, 4, 9, 25:
...

which is possible now and which is concise with no repetition. If the comparison is more complex, you can always introduce a temporary variable, which is one reason why we have initialization expressions in if and switch statements:

if t := Somewhere.SomeVeryLongVariable; t == 1 || t == 4 || t == 9 || t == 29 { ...

I don't see any reason why expressions should be complicated to support your suggestion given that it reduces each sub-expression from t == value to == value, i.e., it saves you from writing a t and a blank.

It's easy to come up with arbitrary new syntax that simplifies a specific use case - but it's a slippery slope: typically it's not worth the extra complexity introduced into the language.

latitov commented 5 years ago

@griesemer you are right.

Specifically, you are right that this is slippery slope, and that it's unrelated issue. You are right there.

However, you are not right about "some specific use-case". We all come from different backgrounds. Some work in biotech, others in finance, others in Google... Every field sometimes make what is "rare specific use-case", a common thing. For example, I develop industrial automation software, a programs that loop indefinitely 24/7/365, and control temperature, pressure, etc. In this particular field, the following is a common thing:

if SomeState == true && (...that long sequence here...) {

}

and switch/case won't work here. Well, you can nest switch/case inside if, but it will make it less readable, especially considering that an eye is already trained to see a state-machine whenever there's case/switch. Of course a one can create temporary variable, and that will be a solution... in Go. Because, the Go allows that. Some other language frameworks (specifically industrial IEC-61131-3) doesn't. That's why I wanted it in the first place, why it's on my hot list. That's why when I saw this discussion, I commented. But you are right that it's unrelated, and that Go doesn't need it. But it doesn't need it not because there's case/switch, but because there's no need to define variables 10 screens up the code.

ianlancetaylor commented 5 years ago

There no further comments relevant to this issue, so closing.

golang / go

Proposal: Go 2: add chained interval comparisons #33694