Tnze / go-mc

Collection of Go libraries for Minecraft
https://go-mc.github.io/tutorial/
MIT License
848 stars 114 forks source link

Optimize non-overlapping CFB8 decryption using SIMD XOR #265

Closed layou233 closed 9 months ago

layou233 commented 9 months ago

Following #256, I have further improved the performance of decryption when the dst and src are not overlapping. Because the ciphertext is predictable in decryption process, it is possible to enable the use of SIMD XOR functions (from Go standard library). I also eliminated some bound checks after checking the assemble.

Go linkname is used for backward compatibility since the XORBytes function was not exported until Go 1.20.

Sorry for the delay of this PR. 😢

Before: (encryption and decryption are the same performance)

goos: linux
goarch: amd64
pkg: github.com/Tnze/go-mc/net/CFB8
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkCFB8AES1KOverlapped-2             51133             23781 ns/op          43.06 MB/s           0 B/op          0 allocs/op
BenchmarkCFB8AES1KNonOverlapping-2         46102             24071 ns/op          42.54 MB/s           0 B/op          0 allocs/op
PASS

After:

goos: linux
goarch: amd64
pkg: github.com/Tnze/go-mc/net/CFB8
cpu: AMD EPYC 7763 64-Core Processor                
BenchmarkCFB8AES1KEncryptOverlapped-2              49586             24047 ns/op          42.58 MB/s           0 B/op          0 allocs/op
BenchmarkCFB8AES1KEncryptNonOverlapping-2          50244             23817 ns/op          42.99 MB/s           0 B/op          0 allocs/op
BenchmarkCFB8AES1KDecryptOverlapped-2              49435             24310 ns/op          42.12 MB/s           0 B/op          0 allocs/op
BenchmarkCFB8AES1KDecryptNonOverlapping-2          88490             12211 ns/op          83.86 MB/s           0 B/op          0 allocs/op
PASS

Benchmark data was collected on GitHub Codespace virtual machine.