dotnet / vblang

The home for design of the Visual Basic .NET programming language and runtime library.
290 stars 66 forks source link

(off-topic?) inter-language 'drag race' competition #600

Open pricerc opened 3 years ago

pricerc commented 3 years ago

Former Microsoft engineer turned YouTuber Dave Plummer (https://www.youtube.com/channel/UCNzszbnvQeFzObW0ghk0Ckw) has instigated a test of language performance in one of his videos (https://youtu.be/tQtFdsEcK_s for the latest installment)

I'll leave it to the interested to investigate his channel further, he's got some interesting stuff for long-time users (victims ?) of Microsoft.

The 'competition' involves calculating primes as quickly as possible, and there are now 45 languages 'competing'.

It's a community effort over at: https://github.com/PlummersSoftwareLLC/Primes

There is a VB implementation which is 'ok', but I was able to get about a 15% performance improvement with minimal effort.

And I'm sure there must be ways of improving it further.

hartmair commented 3 years ago

First idea:

Although I liked VB.NET; what is the point in writing a VB.NET version if it is a proper subset of C# nowadays; especially performance stuff like Span is missing, see #322.

Btw., I didn't find your changes at https://github.com/PlummersSoftwareLLC/Primes/tree/drag-race/PrimeBASIC/solution_2

pricerc commented 3 years ago

Btw., I didn't find your changes at https://github.com/PlummersSoftwareLLC/Primes/tree/drag-race/PrimeBASIC/solution_2

I haven't loaded them (yet, not sure if I will have time to get all their requirements together soon enough).

The boolean array is one of the things I did. Although in the context of the 'competition' there's a question mark over that - if you were to try and calculate really large numbers, then memory may be more of a problem with an array of boolean.

Also switching the logic for the 'prime' flag, since ReDim clears them already.

RevensofT commented 3 years ago

I'm optimize it with unfaithful and got result almost twice from current code, any suggestion before I pull merge ? https://github.com/RevensofT/Primes/tree/drag-race/PrimeBASIC/solution_2

Edit: I add faithful version and testing, however both version result difference only 1% - 5% so I remove unfaithful version, only faithful left which also got result almost twice from rbergen's code.

pricerc commented 3 years ago

any suggestion before I pull merge

From a structure point-of-view, I'd make all the method names match the original 'C++' version, and in the same order. I think it would be easier for people trying to compare the implementations.

I think this

     Dim Start_time = DateTime.UtcNow
...
    If (DateTime.UtcNow - Start_time).TotalSeconds > 5.0 Then
        Call (DateTime.UtcNow - Start_time).TotalSeconds.
                    report(Sieve.count, Pass_count)
        Return Sieve
    End If

is part of the 'faithful'. But you can save a few clock cycles by pre-calculating the 'expiry' time.

something like

        Dim Start_time = Date.UtcNow
        Dim Expiry_time = startTime.AddSeconds(5)
...
            If Date.UtcNow > Expiry_time Then
        Call (DateTime.UtcNow - Start_time).TotalSeconds.
                    report(Sieve.count, Pass_count)
        Return Sieve
            End If

It's the weekend, so I might find time to take a look at what you've done.

I haven't really tried to do much optimizing yet. I just converted the original 'C' code without referring to the existing PrimeVB code, and on my computer the existing VB code gets ~1865 runs in 5 seconds, and I get ~2485. The biggest differences being 1) I 'flipped' the non-prime array, since ReDim inititializes to false, and 2) using a combination of a while loop and for loop (like the original C++ code does) instead of nested for loops.

pricerc commented 3 years ago

One thing I was experimenting with, but breaks the 'faithful' algorithm, is instead of calling SQRT to find the square root at the beginning of a loop, was to calculate the square of the factors within the loop. The idea being that the square root calculation is an expensive operation at the silicon level, where calculating an integer square is trivial.

This certainly worked well for calculating primes up to 10,000,000.

RevensofT commented 3 years ago

@pricerc Your suggestion aren't work, loop won't stop, I hope to see your revision and if you have some time, could you test my code on your machine ? I would know how difference it could improvement on difference PC. :)

pricerc commented 3 years ago

Just been doing some review of my experiments, using 'release' builds instead of 'debug' builds (a bit of a facepalm when I clicked that I was still 'debugging'). I have 5 minor variations that I'm experimenting with - two very close to Dave's original C++ version, with BitArrays, one with the true/false logic flipped. Then two using Boolean(), one using Math.Sqrt, the other calculating squares of the factors. And the 5th using BitArray and squaring the factors.

On my machine, a Ryzen 7 4800H running at ~4GHz, with 64GB of ram.

At 1,000,000, an array of Boolean outperforms a BitArray. And my calculating the square outperforms calculating the square root.

At 10,000,000, the BitArray outperforms a Boolean(), although Math.Sqrt is still outperformed by calculating the squares.

At 100,000,000, there is no longer a 'reference' result, but the Math.Sqrt seems to outperform calculating the squares, but only by a small margin.

RevensofT commented 3 years ago

Did you try on remove integer overflow check option ? it's in compile option, in most case it boot code performance.

hartmair commented 3 years ago

One thing I was experimenting with, but breaks the 'faithful' algorithm, is instead of calling SQRT to find the square root at the beginning of a loop, was to calculate the square of the factors within the loop. The idea being that the square root calculation is an expensive operation at the silicon level, where calculating an integer square is trivial.

I am not so sure whether the one call to Math.Sqrt per iteration is a performance issue. However, Math.Sqrt only works for Doubles, so there are a lot of (implicit) conversions from Double to Integer in the loop. I would suggest:

'see https://stackoverflow.com/questions/23672069/fast-integer-square-root
Function Sqrt(value As Integer) As Integer
  Return CInt(Math.Floor(Math.Sqrt(value)))
End Function

Btw., I personally prefer Option Strict On, which warns about such implicit conversions.

RevensofT commented 3 years ago

You maybe not believe it but change result sqrt to Integer make performance down by 1% instead on my code and performance of Dim Sieve_sqrt As Integer = CInt(Math.Floor(Math.Sqrt(value))) and Dim Sieve_sqrt As Integer = Math.Sqrt(Size) is identical, at lease I'm not notice it.

I think change type of value from result of function which can't do inline has some negate impact on inline method optimize instead.

hartmair commented 3 years ago

You maybe not believe it but change result sqrt to Integer make performance down by 1% instead

I didn't expect this^^ Maybe because CInt involves rounding, which may result in Sieve_sqrt one too big..

I tried for myself and came to the same conclusion: Comparing int with int or int with double just makes no real difference at all

hartmair commented 3 years ago

I could get around 18% improvement on my machine by doing the following:

The complete method:

Private Sub process_prim(Size As Integer)
  Dim Sieve_sqrt = Math.Sqrt(Size)
  Dim Factor = 3
  Do While Factor <= Sieve_sqrt
    If primes(Factor >> 1) Then
      Dim Factor2 = Factor << 1
      Dim Multiple = Factor2 + Factor
      Do While Multiple <= Size
        primes(Multiple >> 1) = False
        Multiple += Factor2
      Loop
    End If
    Factor += 2
  Loop
End Sub
RevensofT commented 3 years ago

Great upgrade algorithm, you merge loop from check_prim and find_prim together which improve performance a lot.

On my test, your algorithm got 10% faster then base algorithm and my rewrite version of your algorithm got 15% faster then base algorithm.

Test result 3 times in row.

C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2001;5.0012098;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2225;5.0004361;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2406;5.0005775;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2099;5.0014436;1;algorithm=base,faithful=no,bits=1
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2052;5.0017633;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2243;5.0003168;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2411;5.0019972;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2102;5.0022839;1;algorithm=base,faithful=no,bits=1
C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2042;5.0016609;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2249;5.0015053;1;algorithm=hartmair,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2403;5.0011379;1;algorithm=hartmair,faithful=yes,bits=1
rbergen_vb;2067;5.001148;1;algorithm=base,faithful=no,bits=1

Rewrite version code.

    Private Sub find_prim(Size As Integer,
                          Factor As Integer,
                          Factor2 As Integer)
        Dim Multiple = Factor2 + Factor
        Do
            If Multiple > Size Then Return
            primes(Multiple >> 1) = False
            Multiple = Multiple + Factor2
        Loop
    End Sub

    Private Sub process_prim(Size As Integer, Factor As Integer)
        Dim Sieve_sqrt = Math.Sqrt(Size)
        Do
            If Factor > Sieve_sqrt Then Return
            If primes(Factor >> 1) Then find_prim(Size, Factor, Factor << 1)
            Factor = Factor + 2
        Loop
    End Sub
paul1956 commented 3 years ago

Your numbers depend on CPU microarchitecture, FPU units run in parallel with integer units and in integer only programs they are idle. They also depend on how efficient the runtime is at pipelining and are they optimized for the CPU you are running, conditional jumps are BAD on. You want speedup run the code on all the cores or call into hand tuned parallel math libraries. Also latter versions of DotNet have more tuning so just changing the .Net version could have a large impact,

hartmair commented 3 years ago

I could get another +25% improvement, though I am not convinced it is worth it:

Imports System.Numerics
Imports System.Runtime.InteropServices

' see https://github.com/dotnet/runtime/blob/main/src/libraries/System.Collections/src/System/Collections/BitArray.cs
Public Class BitArray

    Private ReadOnly m_array As Integer()
    Private ReadOnly m_length As Integer

    Public Sub New(length As Integer)
        m_length = length
        m_array = New Integer(length >> 5) {}
    End Sub

    Default Public Property Item(index As Integer) As Boolean
        Get
            Return (m_array(index >> 5) And (1 << index)) <> 0
        End Get
        Set(value As Boolean)
            Dim bitMask = 1 << index
            If value Then
                SetMask(m_array(index >> 5), bitMask)
            Else
                ClearMask(m_array(index >> 5), bitMask)
            End If
        End Set
    End Property
    Private Shared Sub SetMask(ByRef values As Integer, mask As Integer)
        values = values Or mask
    End Sub
    Private Shared Sub ClearMask(ByRef values As Integer, mask As Integer)
        values = values And Not mask
    End Sub

    Public ReadOnly Property Length As Integer
        Get
            Return m_length
        End Get
    End Property

    Public ReadOnly Property Count As Integer
        Get
            Return m_array.Sum(Function(x) BitOperations.PopCount(ToUnsigned(x)))
        End Get
    End Property
    <StructLayout(LayoutKind.Explicit)>
    Private Structure ReinterpretCast32
        <FieldOffset(0)> Public SignedInteger As Integer
        <FieldOffset(0)> Public UnsignedInteger As UInteger
    End Structure
    Private Shared Function ToUnsigned(value As Integer) As UInteger
        Return New ReinterpretCast32() With {.SignedInteger = value}.UnsignedInteger
    End Function

End Class
pricerc commented 3 years ago
  • Swap True and False for primes Makes a big difference if using an array of Boolean.

  • Inlining BitArray from dotnet runtime I think this is pretty much one of the techniques authors have used in some other languages.

To be fair, if you're comparing languages, it's quite reasonable, because then you're relying on the VB compiler for the whole algorithm, instead of relying on some other languages compilation.

I also adapted an integer square root algorithm I found, which was a bit faster than Math.Sqrt.

        Function SquareRoot(value As ULong) As ULong
            Dim remainder As ULong = 0, root As ULong = 0

            For i As Integer = (64 \ 2) To 1 Step -1
                root = root << 1
                remainder = (remainder << 2) Or (value >> (64 - 2))

                value = value << 2
                If (root < remainder) Then
                    remainder -= (root Or 1UL)
                    root += 2UL
                End If
            Next i
            Return root >> 1
        End Function

But I think it needs more testing to be sure.

RevensofT commented 3 years ago

@hartmair I test your new BitArray but always got incorrect result.

Test result:

C:\>dotnet run -c "release free" --project "repos\Prime"
rber_gen_vb;2038;5.0007981;1;algorithm=base,faithful=yes,bits=1
hartmair_gen_vb;2236;5.0008227;1;algorithm=base,faithful=yes,bits=1
hartmair_rewrite_gen_vb;2390;5.0002147;1;algorithm=base,faithful=yes,bits=1
WARNING: result is incorrect!
hartmair_rewrite_BA2_gen_vb;2598;5.0020134;1;algorithm=base,faithful=yes,bits=1
rbergen_vb;2058;5.0003475;1;algorithm=base,faithful=no,bits=1

I add default value option back and it should not cause incorrect result.

Public Class BitArray2

    Private ReadOnly container As Integer()
    Private Const bit32 = 5

    Public Sub New(Capital As Integer, Optional Default_value As Boolean = False)
        container = New Integer((Capital >> bit32) - 1) {}
        If Default_value Then
            Array.Fill(container, -1)

            Dim Extra = Capital And (32 - 1)
            If Extra > 0 Then
                container(container.Length - 1) = (1 << Extra) - 1
            End If
        End If
    End Sub
Nukepayload2 commented 3 years ago

I've converted some code from one of the C# implementations. It uses ArrayPool to reduce garbage collection time.

https://github.com/Nukepayload2/Primes/tree/drag-race/PrimeBASIC/solution_3

hartmair commented 3 years ago

I've converted some code from one of the C# implementations. It uses ArrayPool to reduce garbage collection time.

I am not sure whether an ArrayPool is considered faithful regarding the rules:

See https://github.com/PlummersSoftwareLLC/Primes/blob/drag-race/CONTRIBUTING.md#faithfulness

It uses a class to encapsulate the sieve, or an equivalent feature in your language. This class must contain the full state of the sieve. Each iteration should re-create a new instance of this class.

Nukepayload2 commented 3 years ago

I am not sure whether an ArrayPool is considered faithful regarding the rules:

@hartmair This is a good question. According to the rules of "faithfulness", ArrayPool(Of T).Shared is unfaithful. Because it brings recycled arrays of previous iterations to the current iteration. I'll mark the converted project as "unfaithful" and add a new "faithful" project based on it.