Open johejo opened 1 month ago
It would be nice to add similar optimization for asm implementations. For now, this PR targets only purego.
go test -bench . -count=10 -tags purego
goos: linux goarch: amd64 pkg: github.com/cespare/xxhash/v2 cpu: AMD Ryzen 7 7735HS with Radeon Graphics │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ Sum64/4B-16 2.547n ± 0% 2.579n ± 1% +1.26% (p=0.000 n=10) Sum64/16B-16 4.509n ± 4% 4.571n ± 3% ~ (p=0.425 n=10) Sum64/100B-16 13.55n ± 0% 12.51n ± 1% -7.67% (p=0.000 n=10) Sum64/4KB-16 247.1n ± 0% 246.4n ± 0% -0.26% (p=0.027 n=10) Sum64/10MB-16 617.6µ ± 2% 603.7µ ± 1% -2.26% (p=0.002 n=10) Sum64String/4B-16 2.785n ± 0% 2.783n ± 0% ~ (p=0.341 n=10) Sum64String/16B-16 4.842n ± 3% 4.866n ± 2% ~ (p=0.896 n=10) Sum64String/100B-16 13.38n ± 1% 12.00n ± 0% -10.32% (p=0.000 n=10) Sum64String/4KB-16 249.7n ± 1% 246.1n ± 0% -1.44% (p=0.000 n=10) Sum64String/10MB-16 619.3µ ± 2% 602.7µ ± 1% -2.68% (p=0.001 n=10) DigestBytes/4B-16 8.023n ± 1% 7.909n ± 0% -1.41% (p=0.000 n=10) DigestBytes/16B-16 9.097n ± 2% 9.069n ± 2% ~ (p=0.796 n=10) DigestBytes/100B-16 16.27n ± 1% 16.49n ± 0% +1.35% (p=0.000 n=10) DigestBytes/4KB-16 303.1n ± 1% 260.8n ± 0% -13.93% (p=0.000 n=10) DigestBytes/10MB-16 764.3µ ± 2% 636.7µ ± 1% -16.70% (p=0.000 n=10) DigestString/4B-16 7.901n ± 1% 7.795n ± 1% -1.34% (p=0.000 n=10) DigestString/16B-16 9.064n ± 1% 8.932n ± 1% -1.46% (p=0.002 n=10) DigestString/100B-16 16.48n ± 1% 16.52n ± 0% ~ (p=0.749 n=10) DigestString/4KB-16 302.7n ± 0% 260.9n ± 0% -13.78% (p=0.000 n=10) DigestString/10MB-16 736.9µ ± 0% 637.8µ ± 0% -13.45% (p=0.000 n=10) geomean 152.8n 146.2n -4.31% │ before.txt │ after.txt │ │ B/s │ B/s vs base │ Sum64/4B-16 1.463Gi ± 0% 1.445Gi ± 1% -1.23% (p=0.000 n=10) Sum64/16B-16 3.304Gi ± 4% 3.260Gi ± 3% ~ (p=0.481 n=10) Sum64/100B-16 6.868Gi ± 0% 7.442Gi ± 1% +8.35% (p=0.000 n=10) Sum64/4KB-16 15.08Gi ± 0% 15.12Gi ± 0% +0.27% (p=0.019 n=10) Sum64/10MB-16 15.08Gi ± 2% 15.43Gi ± 1% +2.31% (p=0.002 n=10) Sum64String/4B-16 1.338Gi ± 0% 1.339Gi ± 0% ~ (p=0.393 n=10) Sum64String/16B-16 3.078Gi ± 4% 3.062Gi ± 2% ~ (p=0.912 n=10) Sum64String/100B-16 6.963Gi ± 1% 7.765Gi ± 0% +11.52% (p=0.000 n=10) Sum64String/4KB-16 14.92Gi ± 1% 15.14Gi ± 0% +1.45% (p=0.000 n=10) Sum64String/10MB-16 15.04Gi ± 2% 15.45Gi ± 1% +2.75% (p=0.001 n=10) DigestBytes/4B-16 475.5Mi ± 1% 482.3Mi ± 0% +1.43% (p=0.000 n=10) DigestBytes/16B-16 1.638Gi ± 2% 1.643Gi ± 2% ~ (p=0.796 n=10) DigestBytes/100B-16 5.725Gi ± 1% 5.648Gi ± 0% -1.34% (p=0.000 n=10) DigestBytes/4KB-16 12.29Gi ± 1% 14.28Gi ± 0% +16.18% (p=0.000 n=10) DigestBytes/10MB-16 12.18Gi ± 2% 14.63Gi ± 1% +20.04% (p=0.000 n=10) DigestString/4B-16 482.8Mi ± 1% 489.3Mi ± 1% +1.36% (p=0.000 n=10) DigestString/16B-16 1.644Gi ± 1% 1.668Gi ± 1% +1.48% (p=0.002 n=10) DigestString/100B-16 5.651Gi ± 1% 5.639Gi ± 0% ~ (p=0.684 n=10) DigestString/4KB-16 12.31Gi ± 0% 14.27Gi ± 0% +15.97% (p=0.000 n=10) DigestString/10MB-16 12.64Gi ± 0% 14.60Gi ± 0% +15.54% (p=0.000 n=10) geomean 4.642Gi 4.851Gi +4.50%
It seems to be a slight performance degradation for small inputs such as 4B.
It would be nice to add similar optimization for asm implementations. For now, this PR targets only purego.
It seems to be a slight performance degradation for small inputs such as 4B.