crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.32k stars 1.61k forks source link

A [no overflow check] compiler direective #14828

Open jzakiya opened 1 month ago

jzakiya commented 1 month ago

Based on thoughts derived from this forum thread:

https://forum.crystal-lang.org/t/there-is-a-way-to-optimize-this-program/6947/50

I'm proposing creating a compiler directive something like [no overflow check] to tell the compiler to not perform math overflow checks in the following method. Or even a finer tuned directive to not perform overflow checks around blocks of code.

So given code like this, where you have to use individual &+|&* operators:

def collatz(seed : UInt64)
  steps = 0_u64
  while seed > 1
    tz_count = seed.trailing_zeros_count
    steps &+= tz_count
    seed  >>= tz_count
    if seed > 1
      steps &+= 1
      seed = seed &* 3 &+ 1
    end
  end
  steps
end

You could instead do something like this:

[no overflow checks]
def collatz(seed : UInt64)
  steps = 0_u64
  while seed > 1
    tz_count = seed.trailing_zeros_count
    steps += tz_count
    seed  >>= tz_count
    if seed > 1
      steps += 1
      seed = seed * 3 + 1
    end
  end
  steps
end

This makes the code much, much easier to write|read, and explicitly expresses to anyone reading it what's happening.

Based on the forum discussion of the potential performance improvements by eliminating overflow checks I modified some old code, and saw some significant performance improvement using the &+|&* operators. However, the code become, IMO, unnecessarily longer and verbose.

Here is an example of actual code modified to use the &+|&* operators:

def nextp_init(rhi, kmin, modpg, primes, resinvrs)
  # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'.
  # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and
  # store consecutively as lo_tp|hi_tp pairs for their restracks.
  nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair
  r_hi, r_lo = rhi, rhi - 2              # upper|lower twinpair residue values
  primes.each_with_index do |prime, j|   # for each prime r0..sqrt(N)
    k = (prime &- 2) // modpg            # find the resgroup it's in
    r = (prime &- 2) %  modpg &+ 2       # and its residue value
    r_inv = resinvrs[r].to_u64           # and residue inverse
    rl = (r_inv &* r_lo &- 2) % modpg &+ 2  # compute r's ri for r_lo
    rh = (r_inv &* r_hi &- 2) % modpg &+ 2  # compute r's ri for r_hi
    kl = k &* (prime &+ rl) &+ (r &* rl &- 2) // modpg # kl 1st mult resgroup
    kh = k &* (prime &+ rh) &+ (r &* rh &- 2) // modpg # kh 1st mult resgroup
    kl < kmin ? (kl = (kmin &- kl) % prime; kl = prime &- kl if kl > 0) : (kl -= kmin)
    kh < kmin ? (kh = (kmin &- kh) % prime; kh = prime &- kh if kh > 0) : (kh -= kmin)
    nextp[j * 2] = kl.to_u64             # prime's 1st mult lo_tp resgroup val in range 
    nextp[j * 2 | 1] = kh.to_u64         # prime's 1st mult hi_tp resgroup val in range
  end
  nextp
end

None of the math here requires overflow checks, and having to use the &x operators all over the place just seems an unnecessary burden (besides being ugly) to place on programmers.

This would be so much easier|shorter|better to write.

[no overflow checks]
def nextp_init(rhi, kmin, modpg, primes, resinvrs)
  # Initialize 'nextp' array for twinpair upper residue rhi in 'restwins'.
  # Compute 1st prime multiple resgroups for each prime r0..sqrt(N) and
  # store consecutively as lo_tp|hi_tp pairs for their restracks.
  nextp = Slice(UInt64).new(primes.size*2) # 1st mults array for twinpair
  r_hi, r_lo = rhi, rhi - 2              # upper|lower twinpair residue values
  primes.each_with_index do |prime, j|   # for each prime r0..sqrt(N)
    k = (prime - 2) // modpg             # find the resgroup it's in
    r = (prime - 2) %  modpg + 2         # and its residue value
    r_inv = resinvrs[r].to_u64           # and residue inverse
    rl = (r_inv * r_lo - 2) % modpg + 2  # compute r's ri for r_lo
    rh = (r_inv * r_hi - 2) % modpg + 2  # compute r's ri for r_hi
    kl = k * (prime + rl) + (r * rl - 2) // modpg # kl 1st mult resgroup
    kh = k * (prime + rh) + (r * rh - 2) // modpg # kh 1st mult resgroup
    kl < kmin ? (kl = (kmin - kl) % prime; kl = prime - kl if kl > 0) : (kl -= kmin)
    kh < kmin ? (kh = (kmin - kh) % prime; kh = prime - kh if kh > 0) : (kh -= kmin)
    nextp[j * 2] = kl.to_u64             # prime's 1st mult lo_tp resgroup val in range 
    nextp[j * 2 | 1] = kh.to_u64         # prime's 1st mult hi_tp resgroup val in range
  end
  nextp
end

So now, you just write one set of code to establish correctness, and use a compiler directive to increase performance. If you then need to do overflow checking on some operation(s) for correctness, you can do it on just them by using the &x operators inside the code.

Since I primarily do numerical algorithmic programming, this would make my life so much easier. I could easily tests methods to see which would benefit from this using a single compiler directive, and eliminates having to write an altered version of each method to test.

I hope you see the utility of adding this feature to Crystal. I will make it so much easier for people to get the highest performance they can get from Crystal when they really need|want it.

ysbaddaden commented 1 month ago

Having implemented a bunch of PRNG, I understand the appeal.

That being said, this is just a few & to explicit that we want wrapping math. It could be a lot more awful (look at wrapping_x of Rust). This is on the same level as calling #to_unsafe to skip bound checks, but we won't introduce a "no bounds check".

jzakiya commented 1 month ago

In Rust you can do this:

fn nextp_init(.....)
  unsafe {
    ........
    ........

  }
}

to achieve turning off bounds checking. Again, I'm not concerned with the semantics, I want the same affect. I'm not aware that Rust even has the equivalent of individual &x operators, and if so I don't use them in my Rust code, I just use unsafe.

So if that would be semantically more appealing I would be all for it.

ysbaddaden commented 1 month ago

I didn't know Rust disabled checks inside unsafe (that sounds just as bad as disabling checks in release mode). Anyway, we already rejected the unchecked block notation when designing the feature.

straight-shoota commented 1 month ago

This makes the code much, much easier to write|read, and explicitly expresses to anyone reading it what's happening.

This is doubtful. If a single operator (e.g. +) can have different meanings depending on context, you always need to be aware of the context in order to understand what's actually happening. You may not be able to easily spot a no overflow check directive when looking at a specific piece of code. This introduces ambiguity. Inverting the meaning of &+ to signify a non-wrapping operation makes this even weirder.

I relate to the sentiment that operators with & prefix are harder to read. But I'm pretty sure it's better that way than following this proposal which hinders semantic readability because you have to resolve the ambiguity of operator mechanics.

jzakiya commented 1 month ago

This is doubtful. If a single operator (e.g. +) can have different meanings depending on context, you always need to be aware of the context in order to understand what's actually happening. You may not be able to easily spot a no overflow check directive when looking at a specific piece of code. This introduces ambiguity.

No, no, that is not what I meant. I misspoke if you took it that way. I'm not proposing dual meanings of the &x operators.

I am not proposing any change to the handling of overflow|boundary checking. Crystal already allows users the ability to turn it off on individual operations using the &x operators. That would not change. Crystal also already allows turning off address boundary checking: memory.to_unsafe[x] = y

So philosophically and operationally nothing that already exists changes. All I'm requesting is a stylistic way to make turning checks off easier and more appealing, similarly as provided in Rust, et al.

This will improve the writing|reading of code, and aide in code testing too. Again, nothing proposed here changes the thinking and responsibilities of users to write good code. In fact it enhances it, because it eliminates missing hidden&x operators stuffed somewhere when reading code, or places they could be inserted when writing it.

I think it's undeniable, to a code writer and reader, that something like this is more desirable:

def collatz(seed : UInt64)
  unsafe do
    steps = 0_u64
    while seed > 1
      tz_count = seed.trailing_zeros_count
      steps += tz_count
      seed >>= tz_count
      if seed > 1
        steps += 1
        seed = seed * 3 + 1
      end
    end  
    steps
  end
end

than this:

def collatz(seed : UInt64)
  steps = 0_u64
  while seed > 1
    tz_count = seed.trailing_zeros_count
    steps &+= tz_count
    seed  >>= tz_count
    if seed > 1
      steps &+= 1
      seed = seed &* 3 &+ 1
    end
  end
  steps
end

Two simple lines of code to add, versus changes to 4 individual operators (and this is a very simple real example).

So please focus the discussion|understanding of this proposal on how it will make users happy, while also making code easier to read, modify, and maintain, as in Rust, et al.

straight-shoota commented 1 month ago

Could you clarify this phrase from the OP then?

If you then need to do overflow checking on some operation(s) for correctness, you can do it on just them by using the &x operators inside the code.

I read that as inside a block of code annotated as no overflow check where + means wrapping addition, &+ would mean non-wrapping addition. Apparently I misunderstood.

(this aspect is only a minor detail of course, so no need to elaborate extensively until there's a consensus on the general proposal)

jzakiya commented 1 month ago

I read that as inside a block of code annotated as no overflow check where + means wrapping addition, &+ would mean non-wrapping addition. Apparently I misunderstood.

I was thinking faster than I could write.

If an overflow occurs within an unsafe code block, in the original code you test operation-by-operations using &x until you find the one(s) that cause it. Once you know that, you can put the surrounding code in unsafe blocks if you can't make the overflowing math not overflow (change numeric types, change algorithm to no allow it, etc).

From personal experience, most algorithms work over well defined number spaces (like modulo arithmetic) and have predefined min|max data ranges|types. You don't start out using unsafe code anyway. You only use it when you need|want max speed from you existing code|hardware.

This is how I write my Rust code, when I want to be as fast as possible within a code base. Most stuff occurs once, or infrequently, and won't affect (measurable) performance anyway, so you don't worry about it. But stuff occurring inside loops (like the collatz method example) become prime candidates for doing as fast as possible, for greatest overall speed.

As math in most realworld algorithms perform dozen of &x type operations, it just becomes tedious to change multiple operations, over multiple lines of code, to do this type of testing.

I wasn't around to follow the discussions in Rust around this, but the Rust devs are some of the most annal people around in making their standard code base be safe. But ultimately they accepted that there are times people need to do stuff their coding paradigm considers unsafe, and they have to be responsible for it. But they created a standard|fast way to do it, that shows (and tells the compiler) it's obviously being done when you read|write Rust code. In fact their compiler is so good, it tells you when you have unnecessary unsafe blocks within routines, if you already have an unsafe block at a higher level that uses that code.

Since Rust and Crystal both use LLVM, I imagine it shouldn't be that hard to implement, as a technical matter.