Closed cosmobobak closed 5 months ago
This PR adds a generic SIMD abstraction for writing NNUE inference code, and uses it to improve the code quality and performance for viri's NNUE.
AVX2 performance is ~equal and passes nonregression:
Elo | -0.16 +- 2.52 (95%) SPRT | 8.0+0.08s Threads=1 Hash=16MB LLR | 3.06 (-2.94, 2.94) [-5.00, 0.00] Games | N: 34012 W: 7967 L: 7983 D: 18062 Penta | [130, 3648, 9490, 3584, 154] https://chess.swehosting.se/test/6428/
AVX512 performance is a ~28.9% speedup over master. ARM NEON performance, however, has /cratered/, at about 30% slower than before. I am working on fixing this, and will immediately create a development branch for that after merging this one.
This PR adds a generic SIMD abstraction for writing NNUE inference code, and uses it to improve the code quality and performance for viri's NNUE.
AVX2 performance is ~equal and passes nonregression:
AVX512 performance is a ~28.9% speedup over master. ARM NEON performance, however, has /cratered/, at about 30% slower than before. I am working on fixing this, and will immediately create a development branch for that after merging this one.