Open TLINDEN opened 4 months ago
ooookayyyyyy...
https://github.com/TLINDEN/golsky/commit/e8ed2832332534cc355a50fbf5b137c2f8ea1d72 implements this.
Now, performance with this is indeed much better (we're now at ~40FPS at 1.5k x 1.5k). However, now another problem re-appears (I already observed this during my other performance tests):
Deep array access in golang is SLOWWWWW :(
This is the actual assembler generated by the go compiler for the neighbor counting:
610ms 610ms 829c90: MOVZX 0(R13)(R15*1), R13 grid.go:113
⋮
310ms 310ms 829c9a: ADDL R13, DX grid.go:113
⋮
300ms 300ms 829cda: MOVQ 0x38(SI), R13 grid.go:113
90ms 90ms 829cde: NOPW grid.go:113
960ms 960ms 829ce0: CMPQ R13, AX grid.go:113
. . 829ce3: JBE 0x829dbd grid.go:113
160ms 160ms 829ce9: MOVQ 0x30(SI), R13 grid.go:113
330ms 330ms 829ced: MOVQ 0x8(R13)(R11*8), R15 grid.go:113
310ms 310ms 829cf2: MOVQ 0(R13)(R11*8), R13 grid.go:113
810ms 810ms 829cf7: NOPW 0(AX)(AX*1) grid.go:113
220ms 220ms 829d00: CMPQ CX, R15 grid.go:113
. . 829d03: JAE 0x829db2 grid.go:113
340ms 340ms 829d09: LEAQ 0(CX)(CX*2), R15 grid.go:113
70ms 70ms 829d0d: MOVQ 0x8(R13)(R15*8), DI grid.go:113
2.17s 2.17s 829d12: MOVQ 0(R13)(R15*8), R13 grid.go:113
390ms 390ms 829d17: NOPW 0(AX)(AX*1) grid.go:113
180ms 180ms 829d20: CMPQ BX, DI grid.go:113
10ms 10ms 829d23: JAE 0x829da7 grid.go:113
220ms 220ms 829d29: MOVQ BX, DI grid.go:113
920ms 920ms 829d2c: SHLQ $0x4, BX grid.go:113
140ms 140ms 829d30: MOVQ 0x8(R13)(BX*1), R15 grid.go:113
3.74s 3.74s 829d35: CMPQ R15, R9 grid.go:113
30ms 30ms 829d38: JAE 0x829d9c grid.go:113
1.03s 1.03s 829d3a: LEAQ 0(R15)(R15*2), R15 grid.go:113
380ms 380ms 829d3e: MOVQ 0x8(R10)(R15*8), R12 grid.go:113
4.37s 4.37s 829d43: MOVQ 0(R13)(BX*1), R13 grid.go:113
390ms 390ms 829d48: MOVQ 0(R10)(R15*8), R15 grid.go:113
850ms 850ms 829d4c: CMPQ R13, R12 grid.go:113
10ms 10ms 829d4f: JB 0x829c90 grid.go:113
. . 829d55: JMP 0x829d91 grid.go:113
⋮
. . 829d91: MOVQ R13, AX grid.go:113
. . 829d94: MOVQ R12, CX grid.go:113
. . 829d97: CALL runtime.panicIndex(SB) grid.go:113
. . 829d9c: MOVQ R15, AX grid.go:113
. . 829d9f: MOVQ R9, CX grid.go:113
. . 829da2: CALL runtime.panicIndex(SB) grid.go:113
. . 829da7: MOVQ BX, AX grid.go:113
. . 829daa: MOVQ DI, CX grid.go:113
. . 829dad: CALL runtime.panicIndex(SB) grid.go:113
. . 829db2: MOVQ CX, AX grid.go:113
. . 829db5: MOVQ R15, CX grid.go:113
. . 829db8: CALL runtime.panicIndex(SB) grid.go:113
. . 829dbd: MOVQ R13, CX grid.go:113
. . 829dc0: CALL runtime.panicIndex(SB) grid.go:113
A nightmare. Dammit. The flame graph:
All is fine, besides the count function!
Current master at 1500x1500 still runs only at 30fps.
After reading a lot about golags GC and that it scans the whole mem if there are pointers, I now understand why the Cell version (last pr) is still not faster and there is so much GC activity.
So, I need to revert the Grid to a
[][]bool
again, remove the neighbor pointers and store the neighbor x,y position instead. Also avoid returning pointers to struct.So Grid might look like this:
That way we'd have no pointers anymore AND avoid calculating neighbors on each tick.
Also add benchmarks to better measure this.