Open badamczewski opened 4 years ago
Cool stuff, I'll implement that and see what the performance benefits are for me. Thanks!!!!! :smile:
I'd like to give you some reasoning why I made certain decisions and have your thoughts on it:
if (direction == Left) { x = x - 2 }
I now do some cool calculation stuff to do this. The only point where I still have branching is where I check if a specific point is a valid direction:
bool validLeft = cur.X - 2 > 0 && !map[cur.X - 2, cur.Y];
bool validRight = cur.X + 2 < width && !map[cur.X + 2, cur.Y];
bool validUp = cur.Y - 2 > 0 && !map[cur.X, cur.Y - 2];
bool validDown = cur.Y + 2 < height && !map[cur.X, cur.Y + 2];
I was wondering if you have any ideas on how to make this faster as well?
This is still the latest fastest version: https://github.com/devedse/DeveMazeGeneratorCore/blob/master/DeveMazeGeneratorCore/Generators/AlgorithmBacktrack2Deluxe2.cs
One thing I found really cool as well is the way I use generics now to ensure the action.Invoke
will be omitted by the compiler if Actions is not implemented. I learned that from the other guy in this issue: https://github.com/devedse/DeveMazeGeneratorCore/issues/2
1 (A bit slower it seems):
var chosenDirection = random.Next(targetCount);
int countertje = 0;
bool actuallyGoingLeft = validLeft && chosenDirection == countertje++;
bool actuallyGoingRight = validRight && chosenDirection == countertje++;
bool actuallyGoingUp = validUp && chosenDirection == countertje++;
bool actuallyGoingDown = validDown && chosenDirection == countertje;
int actuallyGoingLeftByte = Unsafe.As<bool, int>(ref actuallyGoingLeft);
int actuallyGoingRightByte = Unsafe.As<bool, int>(ref actuallyGoingRight);
int actuallyGoingUpByte = Unsafe.As<bool, int>(ref actuallyGoingUp);
int actuallyGoingDownByte = Unsafe.As<bool, int>(ref actuallyGoingDown);
var nextX = cur.X + actuallyGoingLeftByte * -2 + actuallyGoingRightByte * 2;
var nextY = cur.Y + actuallyGoingUpByte * -2 + actuallyGoingDownByte * 2;
var nextXInBetween = cur.X - actuallyGoingLeftByte + actuallyGoingRightByte;
var nextYInBetween = cur.Y - actuallyGoingUpByte + actuallyGoingDownByte;
stackje.Push(new MazePoint(nextX, nextY));
map[nextXInBetween, nextYInBetween] = true;
map[nextX, nextY] = true;
pixelChangedCallback.Invoke(nextXInBetween, nextYInBetween, currentStep, totSteps);
pixelChangedCallback.Invoke(nextX, nextY, currentStep, totSteps);
2 (A bit faster it seems):
var chosenDirection = random.Next(targetCount);
int countertje = 0;
bool actuallyGoingLeft = validLeft & chosenDirection == countertje;
int actuallyGoingLeftByte = Unsafe.As<bool, int>(ref actuallyGoingLeft);
countertje += validLeftByte;
bool actuallyGoingRight = validRight & chosenDirection == countertje;
int actuallyGoingRightByte = Unsafe.As<bool, int>(ref actuallyGoingRight);
countertje += validRightByte;
bool actuallyGoingUp = validUp & chosenDirection == countertje;
int actuallyGoingUpByte = Unsafe.As<bool, int>(ref actuallyGoingUp);
countertje += validUpByte;
bool actuallyGoingDown = validDown & chosenDirection == countertje;
int actuallyGoingDownByte = Unsafe.As<bool, int>(ref actuallyGoingDown);
var nextX = cur.X + actuallyGoingLeftByte * -2 + actuallyGoingRightByte * 2;
var nextY = cur.Y + actuallyGoingUpByte * -2 + actuallyGoingDownByte * 2;
var nextXInBetween = cur.X - actuallyGoingLeftByte + actuallyGoingRightByte;
var nextYInBetween = cur.Y - actuallyGoingUpByte + actuallyGoingDownByte;
stackje.Push(new MazePoint(nextX, nextY));
map[nextXInBetween, nextYInBetween] = true;
map[nextX, nextY] = true;
pixelChangedCallback.Invoke(nextXInBetween, nextYInBetween, currentStep, totSteps);
pixelChangedCallback.Invoke(nextX, nextY, currentStep, totSteps);
I did just implement your idea to switch int to long and wow the difference is impressive:
The picture shows the new performance but what was the old result? how many % faster? On my local environment, I got 50% performance boost.
Regarding things going a bit faster when not grouped. Normally it's a good practice to group the same of instructions but it's not the case here since JIT emits multiple different instructons to handle the conditions etc. So in the second case what I assume happens is that registers are beeing reused.
Locals like: actuallyGoingLeft etc. Are beeing either pushed to registers or pushed to the stack but since they are a temp computation the register/stack can be reused. If you group them you will have many locals and you will most likeley run out of registers and the compiler will be forced to push and out of the stack.
The next big one should be optimizing memory access for the bit array and the two dimensional map array. Another big one is SIMD instructions but they work the best if you can access your memory and load up the vectors as fast as possible.
I'm running into some trouble though that the mazes generated aren't correct anymore (see top and botom):
Might be something that I've done wrong though
I just found the issue, apparently I had to change 1 to 1L to make them long's. When I did that though the performance went back to what it was. So around ~3.28 seconds instead of the 1.7 that we had initially :).
Bit sets currently use 32bit words, this is not optimal since CPUs use 64-bit words and 64bit cache lines.