TLINDEN / golsky

game of life with ebitengine
GNU General Public License v3.0
0 stars 0 forks source link

Get rid of pointers #9

Open TLINDEN opened 1 month ago

TLINDEN commented 1 month ago

Current master at 1500x1500 still runs only at 30fps.

After reading a lot about golags GC and that it scans the whole mem if there are pointers, I now understand why the Cell version (last pr) is still not faster and there is so much GC activity.

So, I need to revert the Grid to a [][]bool again, remove the neighbor pointers and store the neighbor x,y position instead. Also avoid returning pointers to struct.

So Grid might look like this:

tye Grid struct {
  Data          [][]bool
  NeighborCount [][]uint8
  Neighbors     [][][][]int
             // Y X ny nx ...or
  Neighbors     [][][]int
             // Y X ny*nx
}

That way we'd have no pointers anymore AND avoid calculating neighbors on each tick.

Also add benchmarks to better measure this.

TLINDEN commented 1 month ago

ooookayyyyyy...

https://github.com/TLINDEN/golsky/commit/e8ed2832332534cc355a50fbf5b137c2f8ea1d72 implements this.

Now, performance with this is indeed much better (we're now at ~40FPS at 1.5k x 1.5k). However, now another problem re-appears (I already observed this during my other performance tests):

Deep array access in golang is SLOWWWWW :(

This is the actual assembler generated by the go compiler for the neighbor counting:

               610ms      610ms   829c90:                     MOVZX 0(R13)(R15*1), R13                                     grid.go:113
                                     ⋮
               310ms      310ms   829c9a:                     ADDL R13, DX                                                 grid.go:113
                                     ⋮
               300ms      300ms   829cda:                     MOVQ 0x38(SI), R13                                           grid.go:113
                90ms       90ms   829cde:                     NOPW                                                         grid.go:113
               960ms      960ms   829ce0:                     CMPQ R13, AX                                                 grid.go:113
                   .          .   829ce3:                     JBE 0x829dbd                                                 grid.go:113
               160ms      160ms   829ce9:                     MOVQ 0x30(SI), R13                                           grid.go:113
               330ms      330ms   829ced:                     MOVQ 0x8(R13)(R11*8), R15                                    grid.go:113
               310ms      310ms   829cf2:                     MOVQ 0(R13)(R11*8), R13                                      grid.go:113
               810ms      810ms   829cf7:                     NOPW 0(AX)(AX*1)                                             grid.go:113
               220ms      220ms   829d00:                     CMPQ CX, R15                                                 grid.go:113
                   .          .   829d03:                     JAE 0x829db2                                                 grid.go:113
               340ms      340ms   829d09:                     LEAQ 0(CX)(CX*2), R15                                        grid.go:113
                70ms       70ms   829d0d:                     MOVQ 0x8(R13)(R15*8), DI                                     grid.go:113
               2.17s      2.17s   829d12:                     MOVQ 0(R13)(R15*8), R13                                      grid.go:113
               390ms      390ms   829d17:                     NOPW 0(AX)(AX*1)                                             grid.go:113
               180ms      180ms   829d20:                     CMPQ BX, DI                                                  grid.go:113
                10ms       10ms   829d23:                     JAE 0x829da7                                                 grid.go:113
               220ms      220ms   829d29:                     MOVQ BX, DI                                                  grid.go:113
               920ms      920ms   829d2c:                     SHLQ $0x4, BX                                                grid.go:113
               140ms      140ms   829d30:                     MOVQ 0x8(R13)(BX*1), R15                                     grid.go:113
               3.74s      3.74s   829d35:                     CMPQ R15, R9                                                 grid.go:113
                30ms       30ms   829d38:                     JAE 0x829d9c                                                 grid.go:113
               1.03s      1.03s   829d3a:                     LEAQ 0(R15)(R15*2), R15                                      grid.go:113
               380ms      380ms   829d3e:                     MOVQ 0x8(R10)(R15*8), R12                                    grid.go:113
               4.37s      4.37s   829d43:                     MOVQ 0(R13)(BX*1), R13                                       grid.go:113
               390ms      390ms   829d48:                     MOVQ 0(R10)(R15*8), R15                                      grid.go:113
               850ms      850ms   829d4c:                     CMPQ R13, R12                                                grid.go:113
                10ms       10ms   829d4f:                     JB 0x829c90                                                  grid.go:113
                   .          .   829d55:                     JMP 0x829d91                                                 grid.go:113
                                     ⋮
                   .          .   829d91:                     MOVQ R13, AX                                                 grid.go:113
                   .          .   829d94:                     MOVQ R12, CX                                                 grid.go:113
                   .          .   829d97:                     CALL runtime.panicIndex(SB)                                  grid.go:113
                   .          .   829d9c:                     MOVQ R15, AX                                                 grid.go:113
                   .          .   829d9f:                     MOVQ R9, CX                                                  grid.go:113
                   .          .   829da2:                     CALL runtime.panicIndex(SB)                                  grid.go:113
                   .          .   829da7:                     MOVQ BX, AX                                                  grid.go:113
                   .          .   829daa:                     MOVQ DI, CX                                                  grid.go:113
                   .          .   829dad:                     CALL runtime.panicIndex(SB)                                  grid.go:113
                   .          .   829db2:                     MOVQ CX, AX                                                  grid.go:113
                   .          .   829db5:                     MOVQ R15, CX                                                 grid.go:113
                   .          .   829db8:                     CALL runtime.panicIndex(SB)                                  grid.go:113
                   .          .   829dbd:                     MOVQ R13, CX                                                 grid.go:113
                   .          .   829dc0:                     CALL runtime.panicIndex(SB)                                  grid.go:113

A nightmare. Dammit. The flame graph:

neighborcounting

All is fine, besides the count function!