attaswift / BigInt

Arbitrary-precision arithmetic in pure Swift
MIT License
762 stars 105 forks source link

Using `ManagedBufferPointer` instead of `Array` as a storage #97

Open LiarPrincess opened 2 years ago

LiarPrincess commented 2 years ago

Hi,

Recently I had to write my own BigInt implementation for Violet - Python VM written in Swift.

Internally I decided to use ManagedBufferPointer instead of Swift Array. The whole design in one sentence would be: union (via tagged pointer) of Int32 (called Smi, after V8) and a heap allocation (magnitude + sign representation) with ARC for garbage collection. The detailed explanation is available in our documentation.

Naturally I'm quite curious why most of the BigInt libraries (including this one) use Array. The current implementation gives you (2014 rMBP with Intel x64):

print("BigUInt.size:", MemoryLayout<BigUInt>.size) // 32
print("BigUInt.stride:", MemoryLayout<BigUInt>.stride) // 32
print("BigInt.size:", MemoryLayout<BigInt>.size) // 33
print("BigInt.stride:", MemoryLayout<BigInt>.stride) // 40

Going with ManagedBufferPointer would give us much smaller numbers:

// Basically our own version of `Swift.Array` specialized for storing `Words`.
// Mainly deals with COW.
struct BigIntStorage {
  struct Header {
    var count: Int
  }

  typealias Word = UInt
  typealias Buffer = ManagedBufferPointer<Header, Word>
}

struct BigUInt2 {
  typealias Word = BigIntStorage.Word

  enum Kind {
    case inline(Word, Word)
    case slice(from: Int, to: Int)
    case array
  }

  var kind: Kind
  var storage: BigIntStorage // <- This line changed!
}

struct BigInt2 {
  enum Sign {
    case plus
    case minus
  }

  typealias Magnitude = BigUInt2
  typealias Word = BigUInt.Word

  public var magnitude: BigUInt2
  public var sign: Sign
}

print("BigUInt2.size:", MemoryLayout<BigUInt2>.size) // 17
print("BigUInt2.stride:", MemoryLayout<BigUInt2>.stride) // 24
print("BigInt2.size:", MemoryLayout<BigInt2>.size) // 18
print("BigInt2.stride:", MemoryLayout<BigInt2>.stride) // 24

I believe that this approach would have following advantages:

The downside is that you would have to implement your own heap storage based on ManagedBufferPointer, but this is not that difficult.

LiarPrincess commented 2 years ago

As for any regressions: I also propose #98 Using tests from “Violet - Python VM written in Swift”. So, first I would add test cases and them we could (maybe) talk about ManagedBufferPointer.

tgymnich commented 2 years ago

This sounds great. Did you already benchmark both approaches?

LiarPrincess commented 2 years ago

This is a little bit more complicated. There is no silver bullet and there are multiple ways in which you can implement a BigInt depending on what use-cases you target.

Before I implement this change I want to close the #98 Using tests from “Violet - Python VM written in Swift”.

The improvements (if any) would be only in some specific scenarios, definitely not in the most common case then the test looks like this:

let a: BigInt = …
let b: BigInt = …
do something with them, maybe even I a loop…

Stride only matters in continuous storage, like arrays and structs. In Violet having a stride 8 (single pointer) means that we can fit more BigInts in a single cache line which matters in some scenarios.

In addition, things work well in Violet because we only have 2 representations:

In 99% of the cases we are smi which is nice for branch predictor in some very tight loops. This may not be the case for 'attaswift/BigInt' which has 3 representations.

Anyway, let's finish the #98 first and then (maybe) go back to this issue.