goastler / sha_256

optimised sha-256
Apache License 2.0
0 stars 0 forks source link

does different cpus prefer different loop unrolling group size #10

Open goastler opened 1 month ago

goastler commented 1 month ago

e.g. 8 works nicely on my pc, 4 less so, 16 very much less so

goastler commented 1 month ago
goastler commented 1 month ago

yes it will be diff per cpu as the cache size varies, so the cache hits for different step sizes for unrolled loops are going to be different. Unroll too much and too many cpu instructions outweight the benefits, too few and you don't gain cache locality and branch overhead reduction