devedse / DeveMazeGeneratorCore

This is the new version of my maze generator, now made with .NET Core.
MIT License
7 stars 2 forks source link

Improve BitSet #3

Open badamczewski opened 4 years ago

badamczewski commented 4 years ago

Bit sets currently use 32bit words, this is not optimal since CPUs use 64-bit words and 64bit cache lines.

 public class BitArreintjeFastInnerMapArray
    {
        //Internal since it's used by some other classes (because this is so awesome)
        internal long[] _innerData;

        public BitArreintjeFastInnerMapArray(int height)
        {
            _innerData = new long[height / 64 + 1];
        }
....
....
....
devedse commented 4 years ago

Cool stuff, I'll implement that and see what the performance benefits are for me. Thanks!!!!! :smile:

I'd like to give you some reasoning why I made certain decisions and have your thoughts on it:

  1. I tried a try / catch so avoid having the while (stackje.Count == 0). I thought maybe if we wouldn't have this 'if check' every loop, the code would run faster. But according to your comment it would not :).
  2. I've mostly removed branching everywhere I could. Instead of doing if (direction == Left) { x = x - 2 } I now do some cool calculation stuff to do this. The only point where I still have branching is where I check if a specific point is a valid direction:
    bool validLeft = cur.X - 2 > 0 && !map[cur.X - 2, cur.Y];
    bool validRight = cur.X + 2 < width && !map[cur.X + 2, cur.Y];
    bool validUp = cur.Y - 2 > 0 && !map[cur.X, cur.Y - 2];
    bool validDown = cur.Y + 2 < height && !map[cur.X, cur.Y + 2];

    I was wondering if you have any ideas on how to make this faster as well?

This is still the latest fastest version: https://github.com/devedse/DeveMazeGeneratorCore/blob/master/DeveMazeGeneratorCore/Generators/AlgorithmBacktrack2Deluxe2.cs

One thing I found really cool as well is the way I use generics now to ensure the action.Invoke will be omitted by the compiler if Actions is not implemented. I learned that from the other guy in this issue: https://github.com/devedse/DeveMazeGeneratorCore/issues/2

devedse commented 4 years ago
  1. Another question I had was that I noticed quite some performance difference when re-ordering C# statements. See the comparison between these 2:

1 (A bit slower it seems):

                    var chosenDirection = random.Next(targetCount);
                    int countertje = 0;

                    bool actuallyGoingLeft = validLeft && chosenDirection == countertje++;
                    bool actuallyGoingRight = validRight && chosenDirection == countertje++;
                    bool actuallyGoingUp = validUp && chosenDirection == countertje++;
                    bool actuallyGoingDown = validDown && chosenDirection == countertje;

                    int actuallyGoingLeftByte = Unsafe.As<bool, int>(ref actuallyGoingLeft);
                    int actuallyGoingRightByte = Unsafe.As<bool, int>(ref actuallyGoingRight);
                    int actuallyGoingUpByte = Unsafe.As<bool, int>(ref actuallyGoingUp);
                    int actuallyGoingDownByte = Unsafe.As<bool, int>(ref actuallyGoingDown);

                    var nextX = cur.X + actuallyGoingLeftByte * -2 + actuallyGoingRightByte * 2;
                    var nextY = cur.Y + actuallyGoingUpByte * -2 + actuallyGoingDownByte * 2;

                    var nextXInBetween = cur.X - actuallyGoingLeftByte + actuallyGoingRightByte;
                    var nextYInBetween = cur.Y - actuallyGoingUpByte + actuallyGoingDownByte;

                    stackje.Push(new MazePoint(nextX, nextY));
                    map[nextXInBetween, nextYInBetween] = true;
                    map[nextX, nextY] = true;

                    pixelChangedCallback.Invoke(nextXInBetween, nextYInBetween, currentStep, totSteps);
                    pixelChangedCallback.Invoke(nextX, nextY, currentStep, totSteps);

2 (A bit faster it seems):

                    var chosenDirection = random.Next(targetCount);
                    int countertje = 0;

                    bool actuallyGoingLeft = validLeft & chosenDirection == countertje;
                    int actuallyGoingLeftByte = Unsafe.As<bool, int>(ref actuallyGoingLeft);
                    countertje += validLeftByte;

                    bool actuallyGoingRight = validRight & chosenDirection == countertje;
                    int actuallyGoingRightByte = Unsafe.As<bool, int>(ref actuallyGoingRight);
                    countertje += validRightByte;

                    bool actuallyGoingUp = validUp & chosenDirection == countertje;
                    int actuallyGoingUpByte = Unsafe.As<bool, int>(ref actuallyGoingUp);
                    countertje += validUpByte;

                    bool actuallyGoingDown = validDown & chosenDirection == countertje;
                    int actuallyGoingDownByte = Unsafe.As<bool, int>(ref actuallyGoingDown);

                    var nextX = cur.X + actuallyGoingLeftByte * -2 + actuallyGoingRightByte * 2;
                    var nextY = cur.Y + actuallyGoingUpByte * -2 + actuallyGoingDownByte * 2;

                    var nextXInBetween = cur.X - actuallyGoingLeftByte + actuallyGoingRightByte;
                    var nextYInBetween = cur.Y - actuallyGoingUpByte + actuallyGoingDownByte;

                    stackje.Push(new MazePoint(nextX, nextY));
                    map[nextXInBetween, nextYInBetween] = true;
                    map[nextX, nextY] = true;

                    pixelChangedCallback.Invoke(nextXInBetween, nextYInBetween, currentStep, totSteps);
                    pixelChangedCallback.Invoke(nextX, nextY, currentStep, totSteps);
devedse commented 4 years ago

I did just implement your idea to switch int to long and wow the difference is impressive: image

badamczewski commented 4 years ago

The picture shows the new performance but what was the old result? how many % faster? On my local environment, I got 50% performance boost.

Regarding things going a bit faster when not grouped. Normally it's a good practice to group the same of instructions but it's not the case here since JIT emits multiple different instructons to handle the conditions etc. So in the second case what I assume happens is that registers are beeing reused.

Locals like: actuallyGoingLeft etc. Are beeing either pushed to registers or pushed to the stack but since they are a temp computation the register/stack can be reused. If you group them you will have many locals and you will most likeley run out of registers and the compiler will be forced to push and out of the stack.

badamczewski commented 4 years ago

The next big one should be optimizing memory access for the bit array and the two dimensional map array. Another big one is SIMD instructions but they work the best if you can access your memory and load up the vectors as fast as possible.

devedse commented 4 years ago

I'm running into some trouble though that the mazes generated aren't correct anymore (see top and botom):

SmallTestGeneratedMazeNoPathAlgorithmBacktrack2Deluxe2

Might be something that I've done wrong though

devedse commented 4 years ago

I just found the issue, apparently I had to change 1 to 1L to make them long's. When I did that though the performance went back to what it was. So around ~3.28 seconds instead of the 1.7 that we had initially :).