golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.96k stars 17.66k forks source link

cmd/go: add GOMIPS32, GOMIPS64 ISA levels (iii, r1, r2, r5, r6) #60072

Open HeliC829 opened 1 year ago

HeliC829 commented 1 year ago

Currently GOMIPS64 accepts hardfloat(as default) and softfloat.

Golang currently support MIPS III or higher. I had submitted two CLs and they take little performance improvement to MIPS64. CL 485635 CL 485595. But those instructions only available after r1 but not MIPS III.

So if we want to get more performance improvement on mips64, we should support more isa level.

We wish that GOMIPS can also accept r2/r5.

I tried introduce some instructions from MIPS R2. The following data shows the test results and performance improvement if we can support newer isa level on mips64x..

goos: linux
goarch: mips64le
pkg: crypto/tls
                                                 │    oldtls     │               newtls                │
                                                 │    sec/op     │    sec/op     vs base               │
CertCache/0-4                                       5.839m ±  6%   6.417m ±  8%   +9.91% (p=0.001 n=8)
CertCache/1-4                                       6.277m ±  6%   6.246m ±  7%        ~ (p=0.721 n=8)
CertCache/2-4                                       6.119m ± 14%   6.305m ±  7%   +3.04% (p=0.050 n=8)
CertCache/3-4                                       6.115m ± 10%   6.542m ± 11%   +6.98% (p=0.038 n=8)
HandshakeServer/RSA-4                               6.293m ±  1%   6.214m ±  0%   -1.26% (p=0.002 n=8)
HandshakeServer/ECDHE-P256-RSA/TLSv13-4             11.57m ±  0%   11.34m ±  1%   -1.98% (p=0.010 n=8)
HandshakeServer/ECDHE-P256-RSA/TLSv12-4             10.89m ±  0%   10.79m ±  0%   -0.88% (p=0.000 n=8)
HandshakeServer/ECDHE-P256-ECDSA-P256/TLSv13-4      7.247m ±  1%   7.008m ±  1%   -3.29% (p=0.007 n=8)
HandshakeServer/ECDHE-P256-ECDSA-P256/TLSv12-4      6.592m ±  0%   6.496m ±  0%   -1.46% (p=0.000 n=8)
HandshakeServer/ECDHE-X25519-ECDSA-P256/TLSv13-4    5.356m ±  3%   5.172m ±  3%   -3.45% (p=0.015 n=8)
HandshakeServer/ECDHE-X25519-ECDSA-P256/TLSv12-4    4.686m ±  1%   4.566m ±  0%   -2.56% (p=0.000 n=8)
HandshakeServer/ECDHE-P521-ECDSA-P521/TLSv13-4      220.2m ±  0%   217.4m ±  0%   -1.26% (p=0.000 n=8)
HandshakeServer/ECDHE-P521-ECDSA-P521/TLSv12-4      219.6m ±  0%   216.9m ±  0%   -1.25% (p=0.000 n=8)
Throughput/MaxPacket/1MB/TLSv12-4                   519.1m ±  1%   148.1m ±  2%  -71.47% (p=0.000 n=8)
Throughput/MaxPacket/1MB/TLSv13-4                   537.9m ±  0%   164.5m ±  1%  -69.42% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv12-4                  1028.5m ±  0%   279.3m ±  1%  -72.85% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv13-4                  1063.3m ±  0%   313.1m ±  1%  -70.56% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv12-4                  2036.4m ±  0%   552.1m ±  1%  -72.89% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv13-4                  2106.4m ±  0%   614.5m ±  1%  -70.83% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv12-4                    4.064 ±  0%    1.080 ±  4%  -73.43% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv13-4                    4.198 ±  0%    1.212 ±  7%  -71.12% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv12-4                   8.115 ±  1%    2.202 ±  7%  -72.87% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv13-4                   8.383 ±  0%    2.403 ±  1%  -71.33% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv12-4                  16.198 ±  0%    4.283 ±  0%  -73.56% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv13-4                  16.763 ±  0%    4.792 ±  1%  -71.42% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv12-4                  32.388 ±  0%    8.603 ±  2%  -73.44% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv13-4                  33.502 ±  0%    9.636 ±  1%  -71.24% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv12-4               514.2m ±  1%   146.3m ±  1%  -71.55% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv13-4               531.9m ±  1%   162.4m ±  2%  -69.47% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv12-4              1019.8m ±  3%   279.2m ±  3%  -72.62% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv13-4              1056.9m ±  0%   311.2m ±  1%  -70.56% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv12-4              2031.2m ±  1%   547.3m ±  1%  -73.06% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv13-4              2102.5m ±  0%   608.2m ±  1%  -71.07% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv12-4                4.053 ±  0%    1.082 ±  1%  -73.31% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv13-4                4.193 ±  0%    1.216 ±  1%  -70.99% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv12-4               8.104 ±  1%    2.151 ±  2%  -73.46% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv13-4               8.388 ±  0%    2.406 ±  1%  -71.32% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv12-4              16.202 ±  0%    4.287 ±  1%  -73.54% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv13-4              16.761 ±  0%    4.869 ±  2%  -70.95% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv12-4              32.394 ±  0%    8.589 ±  2%  -73.49% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv13-4              33.500 ±  0%    9.610 ±  3%  -71.31% (p=0.000 n=8)
Latency/MaxPacket/200kbps/TLSv12-4                  719.9m ±  0%   712.3m ±  0%   -1.06% (p=0.000 n=8)
Latency/MaxPacket/200kbps/TLSv13-4                  722.7m ±  0%   714.5m ±  0%   -1.13% (p=0.000 n=8)
Latency/MaxPacket/500kbps/TLSv12-4                  303.9m ±  0%   296.1m ±  0%   -2.57% (p=0.000 n=8)
Latency/MaxPacket/500kbps/TLSv13-4                  304.5m ±  0%   296.3m ±  0%   -2.68% (p=0.000 n=8)
Latency/MaxPacket/1000kbps/TLSv12-4                 165.5m ±  0%   157.5m ±  0%   -4.85% (p=0.000 n=8)
Latency/MaxPacket/1000kbps/TLSv13-4                 165.0m ±  0%   156.6m ±  0%   -5.07% (p=0.000 n=8)
Latency/MaxPacket/2000kbps/TLSv12-4                 96.05m ±  0%   88.17m ±  0%   -8.21% (p=0.000 n=8)
Latency/MaxPacket/2000kbps/TLSv13-4                 95.48m ±  0%   87.23m ±  0%   -8.65% (p=0.000 n=8)
Latency/MaxPacket/5000kbps/TLSv12-4                 54.42m ±  1%   46.43m ±  0%  -14.68% (p=0.000 n=8)
Latency/MaxPacket/5000kbps/TLSv13-4                 54.75m ±  0%   46.36m ±  0%  -15.33% (p=0.000 n=8)
Latency/DynamicPacket/200kbps/TLSv12-4              152.4m ±  0%   149.2m ±  0%   -2.13% (p=0.000 n=8)
Latency/DynamicPacket/200kbps/TLSv13-4              153.8m ±  0%   151.6m ±  0%   -1.48% (p=0.000 n=8)
Latency/DynamicPacket/500kbps/TLSv12-4              73.47m ±  0%   69.92m ±  1%   -4.84% (p=0.000 n=8)
Latency/DynamicPacket/500kbps/TLSv13-4              72.63m ±  1%   70.06m ±  0%   -3.54% (p=0.000 n=8)
Latency/DynamicPacket/1000kbps/TLSv12-4             47.15m ±  0%   43.59m ±  0%   -7.55% (p=0.000 n=8)
Latency/DynamicPacket/1000kbps/TLSv13-4             45.26m ±  1%   42.60m ±  1%   -5.88% (p=0.000 n=8)
Latency/DynamicPacket/2000kbps/TLSv12-4             33.88m ±  0%   30.25m ±  0%  -10.70% (p=0.000 n=8)
Latency/DynamicPacket/2000kbps/TLSv13-4             31.90m ±  1%   29.36m ±  0%   -7.96% (p=0.000 n=8)
Latency/DynamicPacket/5000kbps/TLSv12-4             25.60m ±  0%   21.99m ±  1%  -14.12% (p=0.000 n=8)
Latency/DynamicPacket/5000kbps/TLSv13-4             24.41m ±  0%   21.93m ±  1%  -10.19% (p=0.000 n=8)
geomean                                             346.1m         188.9m        -45.43%

                                       │    oldtls     │                newtls                 │
                                       │      B/s      │      B/s       vs base                │
Throughput/MaxPacket/1MB/TLSv12-4        1.926Mi ±  1%   6.752Mi ± 13%  +250.50% (p=0.000 n=8)
Throughput/MaxPacket/1MB/TLSv13-4        1.860Mi ±  1%   6.080Mi ±  4%  +226.92% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv12-4        1.945Mi ±  0%   7.162Mi ±  3%  +268.14% (p=0.000 n=8)
Throughput/MaxPacket/2MB/TLSv13-4        1.884Mi ±  4%   6.390Mi ± 21%  +239.24% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv12-4        1.965Mi ±  2%   7.243Mi ±  4%  +268.69% (p=0.000 n=8)
Throughput/MaxPacket/4MB/TLSv13-4        1.898Mi ±  1%   6.509Mi ±  5%  +242.96% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv12-4        1.969Mi ±  0%   7.405Mi ±  6%  +276.03% (p=0.000 n=8)
Throughput/MaxPacket/8MB/TLSv13-4        1.907Mi ±  0%   6.599Mi ± 18%  +246.00% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv12-4       1.974Mi ±  1%   7.262Mi ±  8%  +267.87% (p=0.000 n=8)
Throughput/MaxPacket/16MB/TLSv13-4       1.907Mi ±  4%   6.657Mi ±  2%  +249.00% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv12-4       1.974Mi ±  2%   7.467Mi ±  3%  +278.26% (p=0.000 n=8)
Throughput/MaxPacket/32MB/TLSv13-4       1.907Mi ±  1%   6.680Mi ±  1%  +250.25% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv12-4       1.974Mi ±  2%   7.439Mi ±  3%  +276.81% (p=0.000 n=8)
Throughput/MaxPacket/64MB/TLSv13-4       1.912Mi ±  1%   6.642Mi ±  2%  +247.38% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv12-4    1.945Mi ± 12%   6.838Mi ±  8%  +251.47% (p=0.000 n=8)
Throughput/DynamicPacket/1MB/TLSv13-4    1.879Mi ±  1%   6.156Mi ±  3%  +227.66% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv12-4    1.965Mi ± 11%   7.167Mi ± 16%  +264.81% (p=0.000 n=8)
Throughput/DynamicPacket/2MB/TLSv13-4    1.893Mi ±  1%   6.428Mi ±  2%  +239.55% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv12-4    1.969Mi ±  1%   7.310Mi ±  6%  +271.19% (p=0.000 n=8)
Throughput/DynamicPacket/4MB/TLSv13-4    1.903Mi ±  0%   6.576Mi ±  2%  +245.61% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv12-4    1.974Mi ±  1%   7.396Mi ± 10%  +274.64% (p=0.000 n=8)
Throughput/DynamicPacket/8MB/TLSv13-4    1.907Mi ±  1%   6.576Mi ±  3%  +244.75% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv12-4   1.974Mi ±  3%   7.439Mi ±  3%  +276.81% (p=0.000 n=8)
Throughput/DynamicPacket/16MB/TLSv13-4   1.907Mi ±  1%   6.647Mi ±  4%  +248.50% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv12-4   1.974Mi ±  0%   7.463Mi ± 10%  +278.02% (p=0.000 n=8)
Throughput/DynamicPacket/32MB/TLSv13-4   1.907Mi ±  1%   6.576Mi ±  2%  +244.75% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv12-4   1.974Mi ±  0%   7.448Mi ±  3%  +277.29% (p=0.000 n=8)
Throughput/DynamicPacket/64MB/TLSv13-4   1.912Mi ±  1%   6.661Mi ±  4%  +248.38% (p=0.000 n=8)
geomean                                  1.931Mi         6.878Mi        +256.13%
goos: linux
goarch: mips64le
pkg: crypto/md5
                      │    oldmd5    │               newmd5               │
                      │    sec/op    │   sec/op     vs base               │
Hash8Bytes-4             2.712µ ± 0%   2.514µ ± 0%   -7.28% (p=0.000 n=8)
Hash64-4                 3.387µ ± 0%   2.999µ ± 0%  -11.46% (p=0.000 n=8)
Hash128-4                4.115µ ± 0%   3.527µ ± 0%  -14.30% (p=0.000 n=8)
Hash256-4                5.569µ ± 0%   4.583µ ± 0%  -17.71% (p=0.000 n=8)
Hash512-4                8.492µ ± 0%   6.709µ ± 0%  -21.00% (p=0.000 n=8)
Hash1K-4                 14.31µ ± 0%   10.94µ ± 0%  -23.57% (p=0.000 n=8)
Hash8K-4                 95.82µ ± 0%   70.18µ ± 0%  -26.76% (p=0.000 n=8)
Hash1M-4                11.933m ± 0%   8.674m ± 0%  -27.31% (p=0.000 n=8)
Hash8M-4                 95.45m ± 0%   69.40m ± 0%  -27.29% (p=0.000 n=8)
Hash8BytesUnaligned-4    2.784µ ± 0%   2.588µ ± 0%   -7.04% (p=0.000 n=8)
Hash1KUnaligned-4        14.31µ ± 0%   10.95µ ± 0%  -23.48% (p=0.000 n=8)
Hash8KUnaligned-4        95.76µ ± 0%   70.23µ ± 0%  -26.66% (p=0.000 n=8)
geomean                  38.51µ        30.88µ       -19.82%

                      │    oldmd5     │                newmd5                │
                      │      B/s      │      B/s       vs base               │
Hash8Bytes-4            2.813Mi ±  0%    3.033Mi ± 0%   +7.80% (p=0.000 n=8)
Hash64-4                18.02Mi ±  0%    20.35Mi ± 0%  +12.91% (p=0.000 n=8)
Hash128-4               29.66Mi ±  0%    34.61Mi ± 0%  +16.69% (p=0.000 n=8)
Hash256-4               43.85Mi ±  0%    53.27Mi ± 0%  +21.50% (p=0.000 n=8)
Hash512-4               57.50Mi ±  0%    72.78Mi ± 0%  +26.59% (p=0.000 n=8)
Hash1K-4                68.25Mi ±  0%    89.30Mi ± 0%  +30.84% (p=0.000 n=8)
Hash8K-4                81.53Mi ±  0%   111.33Mi ± 0%  +36.54% (p=0.000 n=8)
Hash1M-4                83.80Mi ± 28%   115.29Mi ± 0%  +37.58% (p=0.000 n=8)
Hash8M-4                83.82Mi ±  0%   115.27Mi ± 0%  +37.52% (p=0.000 n=8)
Hash8BytesUnaligned-4   2.737Mi ±  0%    2.947Mi ± 0%   +7.67% (p=0.000 n=8)
Hash1KUnaligned-4       68.24Mi ±  0%    89.19Mi ± 0%  +30.69% (p=0.000 n=8)
Hash8KUnaligned-4       81.59Mi ±  0%   111.24Mi ± 0%  +36.34% (p=0.000 n=8)
geomean                 33.84Mi          42.21Mi       +24.72%
goos: linux
goarch: mips64le
pkg: crypto/sha1
                   │   oldsha1   │              newsha1               │
                   │   sec/op    │   sec/op     vs base               │
Hash8Bytes/New-4     5.341µ ± 0%   4.863µ ± 0%   -8.95% (p=0.000 n=8)
Hash8Bytes/Sum-4     5.456µ ± 0%   4.983µ ± 0%   -8.68% (p=0.000 n=8)
Hash320Bytes/New-4   16.69µ ± 0%   13.85µ ± 0%  -17.00% (p=0.000 n=8)
Hash320Bytes/Sum-4   16.81µ ± 0%   13.97µ ± 0%  -16.92% (p=0.000 n=8)
Hash1K/New-4         42.90µ ± 0%   34.81µ ± 0%  -18.87% (p=0.000 n=8)
Hash1K/Sum-4         43.02µ ± 0%   34.94µ ± 0%  -18.80% (p=0.000 n=8)
Hash8K/New-4         309.6µ ± 0%   248.3µ ± 0%  -19.78% (p=0.000 n=8)
Hash8K/Sum-4         309.5µ ± 0%   248.5µ ± 0%  -19.71% (p=0.000 n=8)
geomean              33.11µ        27.75µ       -16.20%

                   │   oldsha1    │               newsha1               │
                   │     B/s      │     B/s       vs base               │
Hash8Bytes/New-4     1.431Mi ± 1%   1.574Mi ± 1%  +10.00% (p=0.000 n=8)
Hash8Bytes/Sum-4     1.402Mi ± 1%   1.535Mi ± 1%   +9.52% (p=0.000 n=8)
Hash320Bytes/New-4   18.29Mi ± 0%   22.04Mi ± 0%  +20.49% (p=0.000 n=8)
Hash320Bytes/Sum-4   18.15Mi ± 0%   21.85Mi ± 0%  +20.39% (p=0.000 n=8)
Hash1K/New-4         22.76Mi ± 1%   28.06Mi ± 0%  +23.25% (p=0.000 n=8)
Hash1K/Sum-4         22.70Mi ± 0%   27.95Mi ± 0%  +23.13% (p=0.000 n=8)
Hash8K/New-4         25.24Mi ± 0%   31.46Mi ± 0%  +24.64% (p=0.000 n=8)
Hash8K/Sum-4         25.24Mi ± 0%   31.44Mi ± 0%  +24.54% (p=0.000 n=8)
geomean              11.03Mi        13.16Mi       +19.35%
goos: linux
goarch: mips64le
pkg: math/bits
                  │   oldbits    │              newbits               │
                  │    sec/op    │   sec/op     vs base               │
LeadingZeros-4      20.505n ± 1%   6.780n ± 0%  -66.93% (p=0.000 n=8)
LeadingZeros8-4     10.040n ± 0%   9.039n ± 0%   -9.98% (p=0.000 n=8)
LeadingZeros16-4    19.085n ± 0%   9.038n ± 0%  -52.64% (p=0.000 n=8)
LeadingZeros32-4     24.13n ± 0%   10.55n ± 0%  -56.28% (p=0.000 n=8)
LeadingZeros64-4    19.660n ± 0%   6.776n ± 0%  -65.54% (p=0.000 n=8)
TrailingZeros-4     13.055n ± 0%   9.037n ± 0%  -30.77% (p=0.000 n=8)
TrailingZeros8-4     7.364n ± 0%   7.364n ± 0%        ~ (p=0.449 n=8)
TrailingZeros16-4    17.07n ± 0%   10.05n ± 0%  -41.14% (p=0.000 n=8)
TrailingZeros32-4   17.405n ± 0%   8.534n ± 0%  -50.97% (p=0.000 n=8)
TrailingZeros64-4   13.050n ± 0%   9.037n ± 0%  -30.75% (p=0.000 n=8)
OnesCount-4          21.09n ± 0%   21.10n ± 0%        ~ (p=0.054 n=8)
OnesCount8-4         6.024n ± 0%   6.024n ± 0%        ~ (p=0.533 n=8)
OnesCount16-4        13.05n ± 0%   13.05n ± 0%        ~ (p=1.000 n=8)
OnesCount32-4        20.08n ± 0%   20.08n ± 0%        ~ (p=0.367 n=8)
OnesCount64-4        23.10n ± 0%   23.11n ± 0%        ~ (p=0.407 n=8)
RotateLeft-4         9.037n ± 0%   4.418n ± 0%  -51.11% (p=0.000 n=8)
RotateLeft8-4        9.537n ± 0%   9.208n ± 0%   -3.45% (p=0.000 n=8)
RotateLeft16-4       9.208n ± 0%   9.375n ± 0%   +1.82% (p=0.000 n=8)
RotateLeft32-4      10.380n ± 0%   4.021n ± 0%  -61.26% (p=0.000 n=8)
RotateLeft64-4       8.034n ± 0%   4.016n ± 0%  -50.01% (p=0.000 n=8)
Reverse-4            62.26n ± 0%   18.08n ± 0%  -70.96% (p=0.000 n=8)
Reverse8-4           5.020n ± 0%   5.021n ± 0%        ~ (p=1.000 n=8)
Reverse16-4          9.036n ± 0%   9.039n ± 0%        ~ (p=0.098 n=8)
Reverse32-4          29.13n ± 0%   23.11n ± 0%  -20.68% (p=0.000 n=8)
Reverse64-4          27.50n ± 0%   21.10n ± 0%  -23.27% (p=0.000 n=8)
ReverseBytes-4      13.970n ± 1%   3.044n ± 1%  -78.21% (p=0.000 n=8)
ReverseBytes16-4     4.297n ± 1%   4.329n ± 1%   +0.74% (p=0.050 n=8)
ReverseBytes32-4    12.050n ± 0%   5.021n ± 0%  -58.34% (p=0.000 n=8)
ReverseBytes64-4    17.220n ± 2%   3.030n ± 0%  -82.40% (p=0.000 n=8)
Add-4                8.178n ± 0%   8.188n ± 0%        ~ (p=0.661 n=8)
Add32-4              8.284n ± 0%   8.285n ± 0%        ~ (p=0.292 n=8)
Add64-4              7.890n ± 1%   7.876n ± 0%        ~ (p=0.522 n=8)
Add64multiple-4      17.08n ± 0%   17.08n ± 0%        ~ (p=0.297 n=8)
Sub-4                9.543n ± 0%   9.540n ± 0%        ~ (p=0.312 n=8)
Sub32-4              13.07n ± 0%   13.05n ± 0%   -0.08% (p=0.011 n=8)
Sub64-4              10.30n ± 0%   10.29n ± 0%        ~ (p=0.080 n=8)
Sub64multiple-4      19.09n ± 0%   19.08n ± 0%   -0.05% (p=0.008 n=8)
Mul-4                5.100n ± 0%   5.097n ± 0%        ~ (p=0.338 n=8)
Mul32-4              7.371n ± 0%   7.363n ± 0%   -0.11% (p=0.000 n=8)
Mul64-4              5.242n ± 0%   5.020n ± 0%   -4.24% (p=0.000 n=8)
Div-4                133.6n ± 0%   118.4n ± 0%  -11.38% (p=0.000 n=8)
Div32-4              15.65n ± 1%   15.41n ± 0%   -1.53% (p=0.000 n=8)
Div64-4              132.7n ± 0%   117.3n ± 1%  -11.53% (p=0.000 n=8)
geomean              13.85n        9.917n       -28.41%
goos: linux
goarch: mips64le
pkg: crypto/sha256
                    │  oldsha256  │             newsha256              │
                    │   sec/op    │   sec/op     vs base               │
Hash8Bytes/New-4      6.689µ ± 0%   6.094µ ± 0%   -8.89% (p=0.000 n=8)
Hash8Bytes/Sum224-4   7.106µ ± 0%   6.507µ ± 0%   -8.43% (p=0.000 n=8)
Hash8Bytes/Sum256-4   7.217µ ± 0%   6.623µ ± 0%   -8.24% (p=0.000 n=8)
Hash1K/New-4          62.66µ ± 0%   52.35µ ± 0%  -16.45% (p=0.000 n=8)
Hash1K/Sum224-4       62.91µ ± 0%   52.75µ ± 0%  -16.16% (p=0.000 n=8)
Hash1K/Sum256-4       63.03µ ± 0%   52.86µ ± 0%  -16.14% (p=0.000 n=8)
Hash8K/New-4          450.8µ ± 0%   373.5µ ± 0%  -17.15% (p=0.000 n=8)
Hash8K/Sum224-4       451.0µ ± 0%   373.9µ ± 0%  -17.10% (p=0.000 n=8)
Hash8K/Sum256-4       451.5µ ± 0%   374.0µ ± 0%  -17.16% (p=0.000 n=8)
geomean               58.34µ        50.14µ       -14.05%

                    │  oldsha256   │              newsha256               │
                    │     B/s      │      B/s       vs base               │
Hash8Bytes/New-4      1.144Mi ± 1%   1.249Mi ±  0%   +9.17% (p=0.000 n=8)
Hash8Bytes/Sum224-4   1.078Mi ± 1%   1.173Mi ±  0%   +8.85% (p=0.000 n=8)
Hash8Bytes/Sum256-4   1.059Mi ± 1%   1.154Mi ±  0%   +9.01% (p=0.000 n=8)
Hash1K/New-4          15.58Mi ± 0%   18.65Mi ± 12%  +19.71% (p=0.000 n=8)
Hash1K/Sum224-4       15.53Mi ± 1%   18.51Mi ±  0%  +19.23% (p=0.000 n=8)
Hash1K/Sum256-4       15.49Mi ± 0%   18.47Mi ±  0%  +19.24% (p=0.000 n=8)
Hash8K/New-4          17.33Mi ± 0%   20.91Mi ±  0%  +20.69% (p=0.000 n=8)
Hash8K/Sum224-4       17.32Mi ± 0%   20.90Mi ±  0%  +20.65% (p=0.000 n=8)
Hash8K/Sum256-4       17.30Mi ± 0%   20.89Mi ±  0%  +20.73% (p=0.000 n=8)
geomean               6.649Mi        7.729Mi        +16.24%
HeliC829 commented 1 year ago

cc @cherrymui

randall77 commented 1 year ago

See my comment over at https://github.com/golang/go/issues/59415#issuecomment-1540256806

ianlancetaylor commented 1 year ago

@randall77 I think that your comment has been addressed: the proposal here is permitting setting GOMIPS64 to direct the compiler to generate a few special purpose instructions.

@HeliC829 The GOMIPS64 variable already exists, of course. I think that you are suggesting that we permit a comma-separate list of options in GOMIPS64. The options can be

I added r1 because there has to be a way to specify the default. I added the others because compilers support them. I don't know what happened to r4.

Do you have any reference to what the different ISA levels mean? I couldn't find one.

HeliC829 commented 1 year ago

Do you have any reference to what the different ISA levels mean? I couldn't find one.

Here is MIPS ISA level ref, at Page 24 of 148 : https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00083-2B-MIPS64INT-AFP-06.01.pdf

Golang currently support MIPS III on MIPS64, to be notice, MIPS III is different from MIPS R1. So we could consider 3 as the default level,

To resolve there is letter in ISA level, i think we can use the enum mips_isa level defined in gcc in rules.

ianlancetaylor commented 1 year ago

I know that the situation is very confusing, but it doesn't seem ideal to treat 3 as the default level while also permitting r1. Can we come up with a list of strings that makes sense today and also for the future?

HeliC829 commented 1 year ago

OK, so let us use roman numerals iii mean default level MIPS III? And the value related to isa level are as follows:

iii: MIPS III (default, also current MIPS64 isa level) r1:MIPS R1 r2:MIPS R2 r5:MIPS R5 r6:MIPS R6

rsc commented 1 year ago

From the doc linked above:

Screenshot 2023-06-07 at 1 56 15 PM

It sounds like GOMIPS64 is a comma-separated list of choices: hardfloat, softfloat, iii, r1, r2, r5, r6. Probably we should define them all: iii, iv, v, r1, r2, r3, r5, r6. We may not use them today but they'll be defined.

Do I have that right?

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

Rongronggg9 commented 1 year ago

It sounds like GOMIPS64 is a comma-separated list of choices: hardfloat, softfloat, iii, r1, r2, r5, r6.

Right.

Probably we should define them all: iii, iv, v, r1, r2, r3, r5, r6. We may not use them today but they'll be defined.

Just FYI: there is no MIPS IV hardware running Linux distribution in practice and even no MIPS V hardware implementation. Besides, in user space, the difference between III, IV and V is tiny. R3 is a significant release but there are only privileged instructions added and no visible user space change compared to R2. Thus, as a minimum requirement, we consider that only defining iii, r1, r2, r5 and r6 should be enough. It is okay to define other ISA levels as reserved, of course, if there is such a demand.

rsc commented 1 year ago

In practice since we don't emit code that cares about the difference, GOMIPS32=iii and GOMIPS32=iv and GOMIPS32=v will all mean the same thing, but they exist(ed) and it's easy to include them, so we might as well recognize the full set.

rsc commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

Rongronggg9 commented 1 year ago

Based on the discussion above, this proposal seems like a likely accept.

Excited news! Thanks for your review.

GOMIPS32=iii and GOMIPS32=iv and GOMIPS32=v

Did you mean GOMIPS64?

they exist(ed) and it's easy to include them, so we might as well recognize the full set.

Let me summarize:

ISA level GOMIPS32 GOMIPS64
i defined, ? N/A
ii defined, ? N/A
iii N/A valid, implemented (current default)
iv N/A valid, equivalent to iii
v N/A valid, equivalent to iii
r1 valid, implemented (current default) valid, ?[^1]
r2 valid, to be implemented valid, to be implemented
r3 valid, equivalent to r2 valid, equivalent to r2
r4[^2] N/A N/A
r5 valid, to be implemented valid, to be implemented
r6 valid, to be implemented valid, to be implemented

[^1]: I consider we can make GOMIPS64=r1 equivalent to GOMIPS64=iii for the time being, or separate r1-compatible optimizations from the GOMIPS64=r2 patchset if it is not too complex. [^2]: Does not exist.

gopherbot commented 1 year ago

Change https://go.dev/cl/493816 mentions this issue: cmd/internal/obj/mips: add REBH/REBHV/REHVV instructions

gopherbot commented 1 year ago

Change https://go.dev/cl/485595 mentions this issue: math/bits: optimize BitLens64/32 on mips64x

HeliC829 commented 1 year ago

Excited news! Thanks for your review.

GOMIPS32=iii and GOMIPS32=iv and GOMIPS32=v

Did you mean GOMIPS64?

they exist(ed) and it's easy to include them, so we might as well recognize the full set.

Let me summarize:

It‘s such a good summary. Besides, each newer ISA level is the superset of previous version except for R6 (R6 removed and adjusted some outdated instructions due to the changes in microarchitecture desgin).

rsc commented 1 year ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

gopherbot commented 1 year ago

Change https://go.dev/cl/508095 mentions this issue: internal/buildcfg: add support for accepting different MIPS ISA level on mips64

HeliC829 commented 1 year ago

Can some one take a look at CL 508095 ? So that I can rework on CL 485635 CL 485595 again.

gopherbot commented 1 year ago

Change https://go.dev/cl/515475 mentions this issue: cmd/internal/obj/mips: add SEB/SEH instructions

HeliC829 commented 11 months ago

@cherrymui Hi, PTAL on CL 508095, thanks.

gopherbot commented 7 months ago

Change https://go.dev/cl/578175 mentions this issue: cmd/go: add GOMIPS32, GOMIPS64 ISA levels

stffabi commented 5 months ago

Is there any progress on the implementation of this proposal? I'm in the process of contributing ChaCha20 and Poly1305 assembly implementations for MIPSLE to x/crypto. For ChaCha20 it would be great to use ROTR for Mips32r2 and newer.

HeliC829 commented 5 months ago

Is there any progress on the implementation of this proposal? I'm in the process of contributing ChaCha20 and Poly1305 assembly implementations for MIPSLE to x/crypto. For ChaCha20 it would be great to use ROTR for Mips32r2 and newer.

Unfortunately, I'm still waiting for Golang to accept my CL about adding ISA level in GOMIPS{,64} environment variable. After this, I will submit more CLs about implementing the new instructions in mips{32,64}r1, mips{32,64}r2. In the meantime, I have created a fork branch based on the golang release branch, with newer mips instructions support. (implement for mips32 will be added soon).

stffabi commented 5 months ago

Thanks for your status update. I'm going to link your CL from mine, that is going to add ChaCha20 and Poly1305 assembly implementations.

clktmr commented 4 months ago

I'm working on a mips64 target with ISA level MIPS-III. The current compiler doesn't emit valid MIPS-III code anymore. At least the following the commits must be reverted to get back to MIPS-III level (there are probably more):

24f83ed4e29495d5b8b6375aeaa2d34d14629c7d 83c4e533bcf71d86437a5aa9ffc9b5373208628c 68fea523fda227ca5fe7a1eadb7542be4b0a840c 918d4d46cd17192a81a6aced57d09827560ad9f0

It would be greatly appreciated if MIPS-III can still be supported, or a CL accepted which adds that, once ISA level can be selected via the GOMIPS.

HeliC829 commented 4 months ago

I'm working on a mips64 target with ISA level MIPS-III. The current compiler doesn't emit valid MIPS-III code anymore. At least the following the commits must be reverted to get back to MIPS-III level (there are probably more):

For mips64, the current compiler won't emit code higher than MIPS-III. The ssa front end won't emit code higher than MIPS-III for now.

24f83ed 83c4e53 68fea52

Those commits mentioned above are used for ssa backend (aka assembler). It won't emit code directly.

918d4d4

Even if this commit used MIPS MSA instructions, the runtime will detect which instructions should be executed only if MSA is available.

I don't know what happened on your mips64 target, can you provide more information or logs?

clktmr commented 4 months ago

It was my fault. I'm running on a custom OS and didn't initialize CPU correctly in src/internal/cpu/. Tested again and everything is fine.

Sorry for the noise!