AugustNagro / utf8.java

Vectorized UTF-8 Validation for Java
62 stars 7 forks source link

chore: update to more recent jdk (20/21) #8

Closed lemire closed 2 months ago

lemire commented 2 months ago

On my server (AVX-512 capable Ice Lake), the 512-bit routine achieves 18 GB/s with twitter.json. That's much slower than C code which gets to 70 GB/s on the same file and using the same hardware, but 18 GB/s is still quite nice.

Benchmark                          (branchPrediction)              (testFile)   Mode  Cnt       Score        Error  Units
BenchJDK.jdk                                      N/A           /twitter.json  thrpt    5    1730.981 ±     10.969  ops/s
BenchJDK.jdk                                      N/A          /utf8-demo.txt  thrpt    5   64817.209 ±   1610.839  ops/s
BenchJDK.jdk                                      N/A  /utf8-demo-invalid.txt  thrpt    5   59242.711 ±    561.664  ops/s
BenchJDK.jdk                                      N/A                /20k.txt  thrpt    5   55290.128 ±    500.081  ops/s
BenchJDK.vector_128                               N/A           /twitter.json  thrpt    5    5312.508 ±     59.222  ops/s
BenchJDK.vector_128                               N/A          /utf8-demo.txt  thrpt    5  139016.925 ±   1540.087  ops/s
BenchJDK.vector_128                               N/A  /utf8-demo-invalid.txt  thrpt    5  138589.594 ±   1400.524  ops/s
BenchJDK.vector_128                               N/A                /20k.txt  thrpt    5  245011.164 ±   2312.438  ops/s
BenchJDK.vector_256                               N/A           /twitter.json  thrpt    5    8186.286 ±     47.844  ops/s
BenchJDK.vector_256                               N/A          /utf8-demo.txt  thrpt    5  199265.540 ±   1904.101  ops/s
BenchJDK.vector_256                               N/A  /utf8-demo-invalid.txt  thrpt    5  185047.925 ± 108642.203  ops/s
BenchJDK.vector_256                               N/A                /20k.txt  thrpt    5  381098.653 ±    399.680  ops/s
BenchJDK.vector_512                               N/A           /twitter.json  thrpt    5   31273.521 ±    227.835  ops/s
BenchJDK.vector_512                               N/A          /utf8-demo.txt  thrpt    5  450883.515 ±    145.834  ops/s
BenchJDK.vector_512                               N/A  /utf8-demo-invalid.txt  thrpt    5  451001.095 ±   1649.742  ops/s
BenchJDK.vector_512                               N/A                /20k.txt  thrpt    5  366363.454 ±    987.867  ops/s
OneBranchTooMany.scalar                         ZEROS                     N/A  thrpt    5  923994.010 ±   2530.672  ops/s
OneBranchTooMany.scalar                       TRIVIAL                     N/A  thrpt    5  924372.676 ±   4943.354  ops/s
OneBranchTooMany.scalar                        HARDER                     N/A  thrpt    5  923325.406 ±   2881.948  ops/s
OneBranchTooMany.scalar                       HARDEST                     N/A  thrpt    5  636452.437 ±    929.763  ops/s
OneBranchTooMany.vectorBranchFree               ZEROS                     N/A  thrpt    5   60271.606 ±     64.431  ops/s
OneBranchTooMany.vectorBranchFree             TRIVIAL                     N/A  thrpt    5   59439.347 ±     14.737  ops/s
OneBranchTooMany.vectorBranchFree              HARDER                     N/A  thrpt    5  107033.755 ±     44.392  ops/s
OneBranchTooMany.vectorBranchFree             HARDEST                     N/A  thrpt    5  106783.965 ±     63.782  ops/s
OneBranchTooMany.vectorBranchy                  ZEROS                     N/A  thrpt    5   60288.960 ±     25.676  ops/s
OneBranchTooMany.vectorBranchy                TRIVIAL                     N/A  thrpt    5   60278.749 ±     35.453  ops/s
OneBranchTooMany.vectorBranchy                 HARDER                     N/A  thrpt    5   51182.073 ±    211.252  ops/s
OneBranchTooMany.vectorBranchy                HARDEST                     N/A  thrpt    5   38518.926 ±   2054.683  ops/s

I recommend making the benchmarks faster to run for quality of life. The benchmarks currently take 25 minutes to run on my server. A single warming run is enough, and a single testing run is enough. Such a change would speed up by a factor of five the benchmarks, and that would still be way too long (5 minutes).

lemire commented 2 months ago

PR https://github.com/AugustNagro/utf8.java/pull/9 is more invasive but shows that the performance of vector128 and vector256 can be made better.