BigInt Fixes for Issues in Violet Tests

This update fixes all known issues published by @LiarPrincess (see various pull requests for details) Improves the LosslessStringConvertible init() run time by 35X (radix 10) and 310X (radix 16). Improves the CustomStringConvertible toString() run time by 20X (radix 10) and 90X (radix 16). Test suite has been updated to include tests for all known issues. Latest update removes Foundation import and fixes additional problems noted by @LiarPrincess.

Just for reference, original Violet code:

This is XsProMax version (the one without the inlined Int32), according to #256 [BigInt tests][No merge] rabbit How NOT to write performance tests it is slightly faster than standard Violet.

Violet versions have more comments than this PR, so it may be easier to read.

I don't want to start a formal review, because I have not compiled your code, but:

in public func toString(radix: Int, uppercase: Bool) -> String

What if UInt.bitWidth = 64 and radix = 8? Then bitsPerChar = 3, so a single word contains 64/3=21 full groups. What happens to the remaining bit? It should be 'remembered' and used with high bits of the next word. I think that the current implementation just forgets it, but I may be wrong.

See the radixIsPowerOfTwo_charCanUseMultipleWords in original Violet implementation.
in public func toString(radix: Int, uppercase: Bool) -> String

The if #available(macOS 11.0, *) checks could be a little bit more robust like in Violet. Is there anything wrong with Violet version? I think that at least iOS should be also covered with fast path.
in public init?(_ description: String, radix: Int = 10)

There is also string.withUTF8 which would allow you to use non-generic code. In Violet there is a comment why generic version is chosen and what is the performance trade-off. Is this just a copy-paste from Violet or conscious decision?
You report that string parsing when radix is a power of 2 is 310x faster than the old version. According to my performance tests Violet goes up to 345x, so it is basically the same.

On the other hand, decimal parsing can be 190x faster and you report 35x. Is that because addition and multiplication in Violet are faster (4x and 1.4x respectively) or is there some other reason? You can always drop into UnsafeBuffer for performance. Or even write specialized AddMulBuffer, similar how Violet has a DivBuffer in toString.

Btw. Violet performance tests are for both positive and negative numbers. You only test positive and with 2 complement negation means another pass. Though it is not that bad, because in the same operation we do division which is even more expensive.

It would be nice to run all of the Violet tests from all of the PRs on this code to check if there is something else hiding. I am still under covid, and with my mental-potato-state I would do more harm than good. Even writing this comment took me >1h. I may be able to do this during the weekend.

in public func toString(radix: Int, uppercase: Bool) -> String What if UInt.bitWidth = 64 and radix = 8? Then bitsPerChar = 3, so a single word contains 64/3=21 full groups. What happens to the remaining bit? It should be 'remembered' and used with high bits of the next word. I think that the current implementation just forgets it, but I may be wrong. See the radixIsPowerOfTwo_charCanUseMultipleWords in original Violet implementation.

Yes, mea culpa, I thought I reviewed the different binary cases and all divided evenly into 64. As you point out, radix 8 (and 32) are oddballs with 3 (and 5) bits.
I don't have covid as an excuse -- just getting old. 😉 I'll fix the code.

in public func toString(radix: Int, uppercase: Bool) -> String The if #available(macOS 11.0, *) checks could be a little bit more robust like in Violet. Is there anything wrong with Violet version? I think that at least iOS should be also covered with fast path.

No, I was focused on the performance. I'm sure Apple has a tool to do the availability checks for us.

in public init?(_ description: String, radix: Int = 10) There is also string.withUTF8 which would allow you to use non-generic code. In Violet there is a comment why generic version is chosen and what is the performance trade-off. Is this just a copy-paste from Violet or conscious decision?

The most common radix in use is 10, so, while there are performance improvements in binary radices, it didn't seem worth all the code.

You report that string parsing when radix is a power of 2 is 310x faster than the old version. According to my performance tests Violet goes up to 345x, so it is basically the same. On the other hand, decimal parsing can be 190x faster and you report 35x. Is that because addition and multiplication in Violet are faster (4x and 1.4x respectively) or is there some other reason? You can always drop into UnsafeBuffer for performance. Or even write specialized AddMulBuffer, similar how Violet has a DivBuffer in toString. Btw. Violet performance tests are for both positive and negative numbers. You only test positive and with 2 complement negation means another pass. Though it is not that bad, because in the same operation we do division which is even more expensive.

I agree, I would love to spent all my days making this faster, but this seems quite good compared to what it was.

Hmm... I can't compile the code from this PR:

/mnt/Storage/Programming/TEST/swift-numerics/Sources/BigIntModule/BigInt.swift:387:20: error: value of type 'UnsafeMutablePointer<UInt8>' has no member 'update'
            dstPtr.update(from: srcPtr, count: count)
            ~~~~~~ ^~~~~~

More or less here:

if index != -1 {
  count = stringMaxSize - index - 1
  let dstPtr = buffer.baseAddress!
  let srcPtr = dstPtr.advanced(by: index+1)
  dstPtr.update(from: srcPtr, count: count) // <-----
  dstPtr[count] = 0
}

In Violet this code looks like this:

// Check for invalid capacity estimation
if index != -1 {
  count = capacity - index - 1
  let dstPtr = ptr.baseAddress!
  let srcPtr = dstPtr.advanced(by: index + 1)
  dstPtr.assign(from: srcPtr, count: count) // <-----
  // For some reason we have to do this on macOS,
  // otherwise the tests will fail (Swift 5.5.2).
  dstPtr[count] = 0
}

So, the assign(from:count:) was changed to update(from:count:). From what I see assign(from:count:) is now depreciated, and renamed to update(from:count:). Internally it is the same thing:

@inlinable
@_alwaysEmitIntoClient
@available(*, deprecated, renamed: "update(from:count:)")
@_silgen_name("_swift_se0370_UnsafeMutablePointer_assign_from_count")
public func assign(from source: UnsafePointer<Pointee>, count: Int) {
  update(from: source, count: count)
}

Platforms:

Ubuntu 22.04.1 LTS 64-bit + Swift 5.7.3 (swift-5.7.3-RELEASE) from Swift.org

macOS BigSur (11.7.4) + Xcode 13.2.1

swift --version
swift-driver version: 1.26.21 Apple Swift version 5.5.2 (swiftlang-1300.0.47.5 clang-1300.0.29.30)
Target: x86_64-apple-macosx11.0

In both cases I am using swift test --enable-test-discovery to run.

Btw. If you want to have swift-numerics with all of the Violet PRs merged then you can use this ultra sophisticated script. Note that those PRs only add tests and do not modify the sources of the actual BigInt, so you can just copy the test files to your build without any conflicts. Or we can just wait, I may find some time over the weekend to look more closely.

git clone https://github.com/LiarPrincess/swift-numerics.git
cd swift-numerics

git fetch origin biginteger:biginteger
git fetch origin 0-Node:0-Node
git fetch origin 1-Init-from-int:1-Init-from-int
git fetch origin 2-String:2-String
git fetch origin 3-Comparable-equatable-hashable:3-Comparable-equatable-hashable
git fetch origin 4-Unary-PLUS-MINUS-INVERT:4-Unary-PLUS-MINUS-INVERT
git fetch origin 5-Binary-ADD-SUB:5-Binary-ADD-SUB
git fetch origin 6-Binary-MUL-DIV-MOD:6-Binary-MUL-DIV-MOD
git fetch origin 7-Binary-AND-OR-XOR:7-Binary-AND-OR-XOR
git fetch origin 8-Binary-SHIFTS:8-Binary-SHIFTS
git fetch origin 9-Int-properties:9-Int-properties
git fetch origin 10-Copy-on-write:10-Copy-on-write
git fetch origin 11-Property-based-tests:11-Property-based-tests
git fetch origin 12-Apple:12-Apple
git fetch origin 13-Init-from-float:13-Init-from-float
git fetch origin 13-Performance:13-Performance
git fetch origin 15-Codable:15-Codable

git checkout biginteger

git merge --commit --no-edit 0-Node
git merge --commit --no-edit 1-Init-from-int
git merge --commit --no-edit 2-String
git merge --commit --no-edit 3-Comparable-equatable-hashable
git merge --commit --no-edit 4-Unary-PLUS-MINUS-INVERT
git merge --commit --no-edit 5-Binary-ADD-SUB
git merge --commit --no-edit 6-Binary-MUL-DIV-MOD
git merge --commit --no-edit 7-Binary-AND-OR-XOR
git merge --commit --no-edit 8-Binary-SHIFTS
git merge --commit --no-edit 9-Int-properties
git merge --commit --no-edit 10-Copy-on-write
git merge --commit --no-edit 11-Property-based-tests
git merge --commit --no-edit 12-Apple
git merge --commit --no-edit 13-Init-from-float
git merge --commit --no-edit 13-Performance
git merge --commit --no-edit 15-Codable

Oki, when I change update(from:count:) to assign(from:count:) I can run the tests. Not sure what is going on...

Then if we run the full Violet test suite we will get a bunch of failures in BitWidthTests and TrailingZeroBitCountTests. Those are expected, I marked them as a 'rabbit Discussion' in #252 [BigInt tests] rabbit Int properties.

The only other failure is in InitFromBinaryFloatingPoint754Tests:

  // swift test --enable-test-discovery --filter=BigIntTests.InitFromBinaryFloatingPoint754Tests/test_float80_spiderman
  // Skipping the: #if (arch(i386) || arch(x86_64)) && !os(Windows) && !os(Android)
  func test_float80_spiderman() {
    let d: Float80 = 9223372036854775807.5
    let expected: BigInt = BigInt("9223372036854775807")!

    let result = BigInt(d)
    if result != expected {
      print("d:        \(d)")
      print("expected: \(expected)")
      print("result:   \(result)")
    }
  }

This will print:

d:        9223372036854775807.5
expected: 9223372036854775807
result:   9223372036854775807

spidey-meme

Btw. 9223372036854775807.5 is a special number for Float80, because UInt64(1) << Float80.significandBitCount = 9223372036854775808 which is the 1st number where spacing between numbers (away from 0) is 1. In other words 9223372036854775807.5 is the highest number where the next number is 0.5 away.

print(Float80.significandBitCount) // 63 - well *teknikly* there is a hidden bit, but whatever...
let int = UInt64(1) << Float80.significandBitCount
let d = Float80(int)
print("int:             ", int)        // 9223372036854775808
print("Float80:         ", d)          // 9223372036854775808.0
print("Float80.nextDown:", d.nextDown) // 9223372036854775807.5 - goes down by 0.5 - our test case
print("Float80.nextUp:  ", d.nextUp)   // 9223372036854775809.0 - goes up by 1

I do not think this matters... it is just a fancy floating point trick. This does not happen for Float and Double.

Though, I don't think it has to be fixed in this PR. @mgriebling already did a lot of work. It can wait until we merge Violet tests (*if ever*), then we will deal with this failure.

// well teknikly there is a hidden bit, but whatever...

A digression: Float80 doesn't have a hidden significand bit; the leading bit is stored explicitly, unlike all other common IEEE 754 formats (https://en.wikipedia.org/wiki/Extended_precision).

significandBitCount is defined to be the number of fractional bits in the significand (i.e., it does not include the leading bit, which is implicit in every other format)--its the number of bits that are actually stored in the representation of Float16, Float, and Double (https://developer.apple.com/documentation/swift/binaryfloatingpoint/significandbitcount).

For fixed-width floating-point types, this is the actual number of fractional significand bits.

And we follow this pattern for Float80 even though the extra bit is stored in that format:

Note that Float80.significandBitCount is 63, even though 64 bits are used to store the significand in the memory representation of a Float80 (unlike other floating-point types, Float80 explicitly stores the leading integral significand bit, but the BinaryFloatingPoint APIs provide an abstraction so that users don’t need to be aware of this detail).

Yep, I know.

This is the reason why we can just be generic over 'floating point' and not 'floating point with a special case for Float80'.

In the comment I wanted to point that 'there is a hidden bit' unlike in other types, though I may have worded that incorrectly/imprecisely. Maybe // well *teknikly* there IS a hidden bit, but whatever... would be better (uppercase IS).

Also, there are usual checks for the existence of Float80. This code snippet does not contain them, because I wanted to simplify. Obviously, the Violet PR (#258 [BigInt tests] skull Init from float 754) contains the #if thingie.

Anyway, the reason why this code snippet does not do what I would expect is:

result.words:   [9223372036854775807, 0]
expected.words: [9223372036854775807]

Then == checks if the words are equal -> they are not -> we enter if body. I may be wrong somewhere, but if the String representations are equal then I would expect the == to hold (for BigInt).

Btw. if we change == definition to:

  @inlinable
  public static func == (lhs: BigInt, rhs: BigInt) -> Bool {
    print("lhs: \(lhs.words)")
    print("rhs: \(rhs.words)")
    return lhs.words == rhs.words
  }

We will get (just look at the last line):

/mnt/Storage/Programming/TEST/swift-numerics (biginteger*) swift test --enable-test-discovery --filter=BigIntTests.InitFromBinaryFloatingPoint754Tests/test_initFromBinaryFloatingPoint754_Float80 > out.txt
warning: 'swift-numerics': found 4 file(s) which are unhandled; explicitly declare them as resources or exclude from the target
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Helpers/StringTestCases.generated.py
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Performance/PerformanceTests.generated.py
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Nodejs/NodeTests.generated.js
    /mnt/Storage/Programming/TEST/swift-numerics/Tests/BigIntTests/Nodejs/README.md

Building for debugging...
[12/12] Linking swift-numericsPackageTests.xctest
Build complete! (6.22s)
error: malformed

I have not tested this extensively, but any print statement (for example print("ala")) would do this. I did clean and rm -rf .build. I don't think this matters that much for BigInt, it is not like we *actually* want to print those values on every ==.

Ubuntu platform that I mentioned above.

Ugh...

I forgot to mention that the compilation failure (rename from UnsafeMutablePointer.assign(from:count:) to UnsafeMutablePointer.update(from:count:)) was caused by SE-0370: Pointer Family Initialization Improvements and Better Buffer Slices. This change was specifically mentioned in 'Source compatibility' and 'Alternatives considered'.

I can't find in the proposal the exact version of Swift/stdlib where change was implemented (so that we can #if on it). It is also not mentioned in the code:

@inlinable
@_alwaysEmitIntoClient
@available(*, deprecated, renamed: "update(from:count:)")
@_silgen_name("_swift_se0370_UnsafeMutablePointer_assign_from_count")
public func assign(from source: UnsafePointer<Pointee>, count: Int) {
  update(from: source, count: count)
}

I can confirm that:

✅ compilation succeeds:
- Ubuntu 22.04.1 LTS 64-bit + Swift 5.8 (swift-5.8-RELEASE-ubuntu22.04) from Swift.org
❌ compilation fails:
- Ubuntu 22.04.1 LTS 64-bit + Swift 5.7.3 (swift-5.7.3-RELEASE) from Swift.org
- macOS BigSur (11.7.4) + Xcode 13.2.1

Swift 5.8 Release notes mention SE-0370 in 'Swift Evolution Appendix'. Though, I can't 100% confirm that #if swift(>=5.8) is all we need.

The whole Vapor/Fibonacci/BigInt thingie on the forum reminded me of this PR. Just so that we are clear: some parts of this PR are uncomfortably similar to Violet code, and I’m not sure if I’m fine with this.

apple / swift-numerics

BigInt Fixes for Issues in Violet Tests #262