Open pricerc opened 3 years ago
First idea:
Although I liked VB.NET; what is the point in writing a VB.NET version if it is a proper subset of C# nowadays; especially performance stuff like Span is missing, see #322.
Btw., I didn't find your changes at https://github.com/PlummersSoftwareLLC/Primes/tree/drag-race/PrimeBASIC/solution_2
Btw., I didn't find your changes at https://github.com/PlummersSoftwareLLC/Primes/tree/drag-race/PrimeBASIC/solution_2
I haven't loaded them (yet, not sure if I will have time to get all their requirements together soon enough).
The boolean array is one of the things I did. Although in the context of the 'competition' there's a question mark over that - if you were to try and calculate really large numbers, then memory may be more of a problem with an array of boolean.
Also switching the logic for the 'prime' flag, since ReDim clears them already.
I'm optimize it with unfaithful and got result almost twice from current code, any suggestion before I pull merge ? https://github.com/RevensofT/Primes/tree/drag-race/PrimeBASIC/solution_2
Edit: I add faithful version and testing, however both version result difference only 1% - 5% so I remove unfaithful version, only faithful left which also got result almost twice from rbergen's code.
any suggestion before I pull merge
From a structure point-of-view, I'd make all the method names match the original 'C++' version, and in the same order. I think it would be easier for people trying to compare the implementations.
I think this
Dim Start_time = DateTime.UtcNow
...
If (DateTime.UtcNow - Start_time).TotalSeconds > 5.0 Then
Call (DateTime.UtcNow - Start_time).TotalSeconds.
report(Sieve.count, Pass_count)
Return Sieve
End If
is part of the 'faithful'. But you can save a few clock cycles by pre-calculating the 'expiry' time.
something like
Dim Start_time = Date.UtcNow
Dim Expiry_time = startTime.AddSeconds(5)
...
If Date.UtcNow > Expiry_time Then
Call (DateTime.UtcNow - Start_time).TotalSeconds.
report(Sieve.count, Pass_count)
Return Sieve
End If
It's the weekend, so I might find time to take a look at what you've done.
I haven't really tried to do much optimizing yet. I just converted the original 'C' code without referring to the existing PrimeVB code, and on my computer the existing VB code gets ~1865 runs in 5 seconds, and I get ~2485. The biggest differences being 1) I 'flipped' the non-prime array, since ReDim inititializes to false, and 2) using a combination of a while loop and for loop (like the original C++ code does) instead of nested for loops.
One thing I was experimenting with, but breaks the 'faithful' algorithm, is instead of calling SQRT to find the square root at the beginning of a loop, was to calculate the square of the factors within the loop. The idea being that the square root calculation is an expensive operation at the silicon level, where calculating an integer square is trivial.
This certainly worked well for calculating primes up to 10,000,000.
@pricerc Your suggestion aren't work, loop won't stop, I hope to see your revision and if you have some time, could you test my code on your machine ? I would know how difference it could improvement on difference PC. :)
Just been doing some review of my experiments, using 'release' builds instead of 'debug' builds (a bit of a facepalm when I clicked that I was still 'debugging'). I have 5 minor variations that I'm experimenting with - two very close to Dave's original C++ version, with BitArrays, one with the true/false logic flipped. Then two using Boolean(), one using Math.Sqrt, the other calculating squares of the factors. And the 5th using BitArray and squaring the factors.
On my machine, a Ryzen 7 4800H running at ~4GHz, with 64GB of ram.
At 1,000,000, an array of Boolean outperforms a BitArray. And my calculating the square outperforms calculating the square root.
At 10,000,000, the BitArray outperforms a Boolean(), although Math.Sqrt is still outperformed by calculating the squares.
At 100,000,000, there is no longer a 'reference' result, but the Math.Sqrt seems to outperform calculating the squares, but only by a small margin.
Did you try on remove integer overflow check
option ? it's in compile option, in most case it boot code performance.
One thing I was experimenting with, but breaks the 'faithful' algorithm, is instead of calling SQRT to find the square root at the beginning of a loop, was to calculate the square of the factors within the loop. The idea being that the square root calculation is an expensive operation at the silicon level, where calculating an integer square is trivial.
I am not so sure whether the one call to Math.Sqrt
per iteration is a performance issue. However, Math.Sqrt
only works for Doubles, so there are a lot of (implicit) conversions from Double
to Integer
in the loop. I would suggest:
'see https://stackoverflow.com/questions/23672069/fast-integer-square-root
Function Sqrt(value As Integer) As Integer
Return CInt(Math.Floor(Math.Sqrt(value)))
End Function
Btw., I personally prefer Option Strict On
, which warns about such implicit conversions.
You maybe not believe it but change result sqrt
to Integer
make performance down by 1% instead on my code and performance of Dim Sieve_sqrt As Integer = CInt(Math.Floor(Math.Sqrt(value)))
and Dim Sieve_sqrt As Integer = Math.Sqrt(Size)
is identical, at lease I'm not notice it.
I think change type of value from result of function which can't do inline has some negate impact on inline method optimize instead.
You maybe not believe it but change result sqrt to Integer make performance down by 1% instead
I didn't expect this^^ Maybe because CInt
involves rounding, which may result in Sieve_sqrt
one too big..
I tried for myself and came to the same conclusion: Comparing int with int or int with double just makes no real difference at all
I could get around 18% improvement on my machine by doing the following:
check_prim
, find_prim
The complete method:
Private Sub process_prim(Size As Integer)
Dim Sieve_sqrt = Math.Sqrt(Size)
Dim Factor = 3
Do While Factor <= Sieve_sqrt
If primes(Factor >> 1) Then
Dim Factor2 = Factor << 1
Dim Multiple = Factor2 + Factor
Do While Multiple <= Size
primes(Multiple >> 1) = False
Multiple += Factor2
Loop
End If
Factor += 2
Loop
End Sub
Great upgrade algorithm, you merge loop from check_prim and find_prim together which improve performance a lot.
On my test, your algorithm got 10% faster then base algorithm and my rewrite version of your algorithm got 15% faster then base algorithm.
Test result 3 times in row.
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2001;5.0012098;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2225;5.0004361;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2406;5.0005775;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2099;5.0014436;1;algorithm=base,faithful=no,bits=1
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2052;5.0017633;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2243;5.0003168;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2411;5.0019972;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2102;5.0022839;1;algorithm=base,faithful=no,bits=1
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2042;5.0016609;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2249;5.0015053;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2403;5.0011379;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2067;5.001148;1;algorithm=base,faithful=no,bits=1
Rewrite version code.
Private Sub find_prim(Size As Integer,
Factor As Integer,
Factor2 As Integer)
Dim Multiple = Factor2 + Factor
Do
If Multiple > Size Then Return
primes(Multiple >> 1) = False
Multiple = Multiple + Factor2
Loop
End Sub
Private Sub process_prim(Size As Integer, Factor As Integer)
Dim Sieve_sqrt = Math.Sqrt(Size)
Do
If Factor > Sieve_sqrt Then Return
If primes(Factor >> 1) Then find_prim(Size, Factor, Factor << 1)
Factor = Factor + 2
Loop
End Sub
Your numbers depend on CPU microarchitecture, FPU units run in parallel with integer units and in integer only programs they are idle. They also depend on how efficient the runtime is at pipelining and are they optimized for the CPU you are running, conditional jumps are BAD on. You want speedup run the code on all the cores or call into hand tuned parallel math libraries. Also latter versions of DotNet have more tuning so just changing the .Net version could have a large impact,
I could get another +25% improvement, though I am not convinced it is worth it:
True
and False
for primes
BitArray
from dotnet runtimeImports System.Numerics
Imports System.Runtime.InteropServices
' see https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections/src/System/Collections/BitArray.cs
Public Class BitArray
Private ReadOnly m_array As Integer()
Private ReadOnly m_length As Integer
Public Sub New(length As Integer)
m_length = length
m_array = New Integer(length >> 5) {}
End Sub
Default Public Property Item(index As Integer) As Boolean
Get
Return (m_array(index >> 5) And (1 << index)) <> 0
End Get
Set(value As Boolean)
Dim bitMask = 1 << index
If value Then
SetMask(m_array(index >> 5), bitMask)
Else
ClearMask(m_array(index >> 5), bitMask)
End If
End Set
End Property
Private Shared Sub SetMask(ByRef values As Integer, mask As Integer)
values = values Or mask
End Sub
Private Shared Sub ClearMask(ByRef values As Integer, mask As Integer)
values = values And Not mask
End Sub
Public ReadOnly Property Length As Integer
Get
Return m_length
End Get
End Property
Public ReadOnly Property Count As Integer
Get
Return m_array.Sum(Function(x) BitOperations.PopCount(ToUnsigned(x)))
End Get
End Property
<StructLayout(LayoutKind.Explicit)>
Private Structure ReinterpretCast32
<FieldOffset(0)> Public SignedInteger As Integer
<FieldOffset(0)> Public UnsignedInteger As UInteger
End Structure
Private Shared Function ToUnsigned(value As Integer) As UInteger
Return New ReinterpretCast32() With {.SignedInteger = value}.UnsignedInteger
End Function
End Class
Swap
True
andFalse
forprimes
Makes a big difference if using an array of Boolean.Inlining
BitArray
from dotnet runtime I think this is pretty much one of the techniques authors have used in some other languages.
To be fair, if you're comparing languages, it's quite reasonable, because then you're relying on the VB compiler for the whole algorithm, instead of relying on some other languages compilation.
I also adapted an integer square root algorithm I found, which was a bit faster than Math.Sqrt.
Function SquareRoot(value As ULong) As ULong
Dim remainder As ULong = 0, root As ULong = 0
For i As Integer = (64 \ 2) To 1 Step -1
root = root << 1
remainder = (remainder << 2) Or (value >> (64 - 2))
value = value << 2
If (root < remainder) Then
remainder -= (root Or 1UL)
root += 2UL
End If
Next i
Return root >> 1
End Function
But I think it needs more testing to be sure.
@hartmair I test your new BitArray but always got incorrect result.
Test result:
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2038;5.0007981;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2236;5.0008227;1;algorithm=base,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2390;5.0002147;1;algorithm=base,faithful=yes,bits=1
WARNING: result is incorrect!
hartmair_rewrite_BA2_gen_vb;2598;5.0020134;1;algorithm=base,faithful=yes,bits=1
rbergen_vb;2058;5.0003475;1;algorithm=base,faithful=no,bits=1
I add default value option back and it should not cause incorrect result.
Public Class BitArray2
Private ReadOnly container As Integer()
Private Const bit32 = 5
Public Sub New(Capital As Integer, Optional Default_value As Boolean = False)
container = New Integer((Capital >> bit32) - 1) {}
If Default_value Then
Array.Fill(container, -1)
Dim Extra = Capital And (32 - 1)
If Extra > 0 Then
container(container.Length - 1) = (1 << Extra) - 1
End If
End If
End Sub
I've converted some code from one of the C# implementations. It uses ArrayPool
to reduce garbage collection time.
https://github.com/Nukepayload2/Primes/tree/drag-race/PrimeBASIC/solution_3
I've converted some code from one of the C# implementations. It uses
ArrayPool
to reduce garbage collection time.
I am not sure whether an ArrayPool is considered faithful regarding the rules:
See https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race/CONTRIBUTING.md#faithfulness
It uses a class to encapsulate the sieve, or an equivalent feature in your language. This class must contain the full state of the sieve. Each iteration should re-create a new instance of this class.
I am not sure whether an ArrayPool is considered faithful regarding the rules:
@hartmair
This is a good question.
According to the rules of "faithfulness", ArrayPool(Of T).Shared
is unfaithful. Because it brings recycled arrays of previous iterations to the current iteration. I'll mark the converted project as "unfaithful" and add a new "faithful" project based on it.
Former Microsoft engineer turned YouTuber Dave Plummer (https://www.youtube.com/channel/UCNzszbnvQeFzObW0ghk0Ckw) has instigated a test of language performance in one of his videos (https://youtu.be/tQtFdsEcK_s for the latest installment)
I'll leave it to the interested to investigate his channel further, he's got some interesting stuff for long-time users (victims ?) of Microsoft.
The 'competition' involves calculating primes as quickly as possible, and there are now 45 languages 'competing'.
It's a community effort over at: https://github.com/PlummersSoftwareLLC/Primes
There is a VB implementation which is 'ok', but I was able to get about a 15% performance improvement with minimal effort.
And I'm sure there must be ways of improving it further.