Open LiarPrincess opened 1 year ago
I wrote a test script that runs multiple BigInt
implementations:
swift_numerics
- branch from this PR
Violet
- this branch which is a main
branch with all swift_numerics
tests from all PRs
Int32
range. In this tests only 0
, -1
and 1
fall into this range, so it should not matter.Violet XsProMax
- this branch. Violet implementation with following changes:
isNegative
is stored in-line (and not on the heap like in Violet); count
and capacity
are on the heap because I don't want to stray too much from Violet
.struct BigInt {
struct Header {
var count: UInt32
var capacity: UInt32
}
var flags: UInt8
var buffer: ManagedBufferPointer<Header, UInt>
}
attaswift
- this branch which is a main
branch with added performance tests
Platform:
XCTestCase.measure
does not exist on Linux I wrote my own thing.Important:
100 * UInt64.bitWidth
in magnitude)inout
tests start with copying a given BigInt
, so they may be slower than normal operation.values in parens mean relative performance improvement. For example:
🐧 swift_numerics | 🐧 Violet | 🐧 attaswift | |
---|---|---|---|
test_string_fromRadix10 | 0.2386572415 | 0.011443098625 (20.9x) | 0.025662900375 (9.3x) |
Means that Violet
is 20.9 times faster than swift_numerics
and attaswift
is 9.3 times faster.
This test involved String
operations on 1203 BigInts
.
Parsing from string:
attaswift
is ~10x fasterViolet
is 190x faster with radix 10 and 345x faster when radix is a power of 2
Violet
has an additional specialization for powers of 2To string:
attaswift
is 12.5x faster for radix 10 and 152x faster for radix 16Violet
is 21x faster for radix 10 and >2500x faster when radix is power of 2 (up to 3210x (!) faster for radix 16)
Violet
has an additional specialization for powers of 2.swift_numerics
radix 16 takes more time than radix 10! Normally bigger radix -> faster calculation (because we divide by bigger number). This is not normal.This test involved ==
/<
operations on 1_447_209 BigInt
pairs.
swift_numerics
is the fastest - it just uses Swift.Array
compareattaswift/Violet XsProMax
store sign inline, so very often you can get the result without even looking at the heap.Violet
is the slowest, though even a small difference would show here (small difference repeated 1_500_000 times becomes a big difference):
Smi
(inlined Int32
) has a certain costSmi
then we have to fetch sign from the heapswift_release
and swift_retain
with >10% each. I think that all of the implementations have sufficient performance.This test involved +-~
operations on 100_203 BigInts
.
Plus:
Violet
)Minus:
attaswift/Violet XsProMax
are ~4x faster than swift_numerics
/Violet
attaswift/Violet XsProMax
unary '-
' flips the sign and retains magnitude
-> no memcpy
struct BigInt {
struct Header {
var count: UInt32
var capacity: UInt32
}
// Bool and Array in case of 'attaswift'
var flags: UInt8
var buffer: ManagedBufferPointer<Header, UInt>
}
Violet
stores both sign and magnitude on the heap -> unary '-
' has to memcpy
and flip bit.attaswift
, but I already use the tagged pointer to differentiate between small inlined Int32
and heap allocated storage. I didn't want to complicate it too much.swift_numerics
uses 2-complement -> memcpy
and then a few bit operations.Invert:
Violet
is 0.653x in int
test, but we are talking about 0.00803803s
which is basically nothingViolet XsProMax
is 1.31x in big
, but again fighting over 0.00922301s
does not make a lot of senseViolet
it is implemented as ~a = negate(a+1)
. Doing the a+1
by specific code is faster than general purpose add
operation.This test involved +-
operations on 162_409 BigInt
pairs.
Add:
Violet
is 3x faster than swift_numerics
Violet XsProMax
is 3.5x fasterattaswift
is 10% slowerSub:
Violet
is 3.6x faster than swift_numerics
Violet XsProMax
is 4.4x faster than swift_numerics
Violet
is written by handswift_numerics
uses a-b = a+(-b)
which adds an additional operation (negation). It also means that -
is slower than +
which is weird.This test involved */%
operations on 91_809 BigInt
pairs.
Mul:
Violet
is 1.45x faster than swift_numerics
.
school-book method (used by all of the implementations) is a pure ALU + write dependent, there is nothing 'fancy' to make it better.
Violet
is 'ok'. You could write something better by hand, but that is not the name of the game.attaswift
implements Karatsuba, but only applies it above directMultiplicationLimit: Int = 1024
, so it was never used. Their base implementation 6x slower than Violet
.
I am not sure how much time/effort they spend to get the directMultiplicationLimit
value. 1024 seems kind of 'round' (because it is a power of 2). In most of the implementations I have seen ~40 words (Word = UInt
) which would show in our performance tests.
If I was tasked with finding the correct value then I would look for a 'breaking point' where Karatsuba becomes faster than 'normal'. Similar to what the go lang did. Though attaswift
'normal' implementation is quite slow, so that may have affected their measurements (somehow).
we will probably need more detailed tests once we start implementing more advanced algorithms.
Div/mod:
Violet
is 2.4x faster than swift_numerics
.
Word
quotient approximationattaswift
4x slower than Violet
.This test involved &|^
operations on 162_409 BigInt
pairs.
Violet
is 2x faster than swift_numerics
; on paper swift_numerics
should have an advantage:
Violet
stores sign + magnitude - the implementation was taken from GMP - it adds xor
, add
and overflow check to every word.swift_numerics
stores 2 complement which is the best representation for this operationint
range) are weirdly slow in swift_numerics
. Violet
is 3.2x faster. Is there some costly preparation/initialization phase?attaswift
7x slower than Violet
.This test involved shifting 20_203 BigInts
by 5 values, in total 101_015 shifts per test.
Left:
Violet
is 17x faster than swift_numerics
Violet XsProMax
is 20x fasterattaswift
is 10x faster
test_shiftLeft_big
, the inout
is fast, but the normal version is much slower. Ultra repeatable, occurs every time.Right:
Violet
is 20x faster than swift_numerics
Violet XsProMax
is 23x fasterattaswift
is 12x faster
attaswift
rounding mode is different than Swift stdlibThis test was suggested by Xiaodi Wu (@xwu) in BigInt #120, so we may ask them if we have any questions (maybe?). In theory it looks interesting, because it contains +-*/
operations.
In practice count 5000 has following input distribution (more detailed results here):
Operation | Count | Notes |
---|---|---|
add | 39_280 | Inputs of similar size up to 3500 words (3500*UInt64 in magnitude) |
sub | 5_000 | Mostly inputs of similar size up to 3500 words Some inputs where rhs is a single word (UInt64 ) |
mul | 104_086 | lhs goes up to 3522 wordsrhs is always a single word (!!!) |
div | 22_678 | Inputs of similar size up to 3500 words |
compare (> ) |
16_602 | Inputs of similar size up to 3500 words So fast that it does not matter |
Anyway, here are the results:
Results (looking at the 5000 test):
swift_numerics
is slower than python (1.33 vs 0.56)Violet
is faster than Python (0.51 vs 0.56)Violet XsProMax
Violet
(0.88 vs 0.51) <- plot twist!Violet XsProMax
being slower than Violet
is a surprise because in most of the tests above XsProMax
was noticeably faster (5-10%). Though, Violet
has an ace up its sleeve: it is really fast for small integers and this test performs 104_086 multiplications where rhs
falls into Int32
. This was not visible in the tests above because they concentrated on big numbers.
As for the swift_numerics
vs attaswift
: result is dominated by */
. Other operations (namely: +-
) are so few/fast that they are not even noticeable.
Operation | Possible improvement | Note |
---|---|---|
init(String) |
190x for radix 10 345x when radix is a power of 2 |
Discussed in initial post |
toString |
21.5x for radix 10 >2500x when radix is a power of 2 (up to 3210x) |
Radix 16 is slower than radix 10? |
Equality | - | |
Compare | - | |
Unary + |
- | Do not write by hand, it is 1000x slower |
Unary - |
4x | Storing sign inline and magnitude on the heap is much faster (obviously, but now we have numbers) |
Unary ~ |
- | |
Binary + |
3x | |
Binary - |
4.4x | Implemented as a-b = a+(-b) , writing it by hand may be faster (Violet does this) |
Binary * |
1.45x | |
Binary /% |
2.4x | |
Binary &\|^ |
2x | |
Left shift | 17x | |
Right shift | 20x | |
pi | 1.6x 2.6x if we store small values inline |
This test uses a lot of small integers.*/ dominate +- . |
Update 21.2.2023:
equatable
, comparable
and π
testsAnyway, from those tests we can see that Violet
is the fastest.
Under the hood Violet
uses ManagedBufferPointer
. If swift_numerics
decides to go with this route:
be careful when calling [ManagedBufferPointer.isUniqueReference()
](https://developer.apple.com/documentation/swift/managedbufferpointer/isuniquereference()) for COW.
Doing this check on every write is expensive and it will NOT be optimized by the compiler. This matters in tight loops (multiplication etc.).
In Violet
I went with a token based system where calling BigIntStorage.guaranteeUniqueBufferReference()
gives you UniqueBufferToken
. All of the mutating
operations require this token. This way we do the COW check only once. Bonus points for being checked at the compile time (can't call mutating
operation without a token).
go into UnsafeBufferPointer
for final operations.
This ensures max performance. This is mostly visible in mul
and div
. Sometimes allocating a separate buffer for result and then copying it into BigInt
may be faster.
For example the inner logic of division has following signature:
private static func inner(
lhsCount: Int,
rhsCount: Int,
lhsWhichBecomesRemainder lhs: UnsafeMutableBufferPointer<Word>,
rhs: UnsafeBufferPointer<Word>,
result: UnsafeMutableBufferPointer<Word>
) {
dot dot dot
}
It also means that you will be using withXXX
constructs very often:
lhs.withMutableWordsBuffer(lhsToken) { lhs in
remainder.withMutableWordsBuffer(remainderToken) { remainder in
Self.inner(
lhsCount: lhsCount,
rhsCount: rhsCount,
lhsWhichBecomesRemainder: remainder,
rhs: shiftedRhs,
result: lhs
)
}
}
Obviously you can quite easily write something that will be faster than Violet. Violet is heavily geared towards small integers (Int32
) which closes certain optimization opportunities.
Update 20.3.2023:
Please read the #242 Using tests from “Violet - Python VM written in Swift” before.
=== DO NOT MERGE! Discussion only. ===
🐰 Discussion
Those are the all of the basic operations. In the future we may add some more targeted tests for interesting operations (
mul
/div
), but for now this should be enough.It may also be a good idea to move all of the performance tests to a separate test target. Currently I am using
PERFORMANCE_TEST
compilation flag.You can use Violet as a baseline measure, thought Violet was heavily optimized for
Int32
range (see documentation). I could easily write something faster, but this is not my use-case.String parsing
That said, I have to say that I find the
String
parsing performance quite bad. From my tests Violet is ~30 times faster.Violet secret:
Implemented here. Should I create an issue for this?
Mac
I have 2014 rMBP -> mac 11.7 (Big Sur), Xcode 13.2.1, Intel. I assume that 9 years old machines are not exactly a priority for 🍎. Executing all of those tests take ~30 min (serial execution,
DEBUG
), and I can't be bothered to re-run them properly since it will throttle after a few minutes anyway…Linux
Normally it would fail to compile:
But I wrote my own thing. Results below.