bloomberg / blazingmq-sdk-java

Java SDK for BlazingMQ, a modern high-performance open source message queuing system.
https://bloomberg.github.io/blazingmq
Apache License 2.0
26 stars 13 forks source link

Eliminate array copies #31

Open wolfchimneyrock opened 1 year ago

wolfchimneyrock commented 1 year ago

*Issue number of the reported bug or feature request: #30

Describe your changes

  1. BrokerSession passes Collection instead of PutMessageImpl[] to avoid making copies.
  2. ApplicationData calculates crc32c on the fly to avoid having to make an expensive copy of the whole payload.
  3. ByteBufferOutputStream:
    • add peek() call to make an independent view of the underlying data.
    • don't copy and repack ByteBuffer's written to the ByteBufferOutputStream.
  4. NettyTcpConnection still makes a copy of data read, but let netty make the copy instead of us.

Testing performed Unit and integration tests have been updated to reflect the changes.

Additional context As discussed with @sgalichkin

wolfchimneyrock commented 1 year ago

I was able to run the JMH benchmarks with this latest commit, here are the results

Before:

Benchmark                                               Mode  Cnt     Score     Error  Units
ApplicationDataBenchmark.testZlibStreamInOut           thrpt   10     8.664 ±   0.379  ops/s
SessionBenchmark.sendReceive512B                       thrpt    5  1690.186 ±  76.546  ops/s
SessionBenchmark.sendReceive512B_Zlib                  thrpt    5  1686.011 ± 115.191  ops/s
SessionBenchmark.sendReceive512KiB                     thrpt    5     6.356 ±   0.110  ops/s
SessionBenchmark.sendReceive512KiB_Zlib                thrpt    5     6.360 ±   0.184  ops/s
SessionBenchmark.sendReceive5MiB                       thrpt    5    12.180 ±   0.367  ops/s
SessionBenchmark.sendReceive5MiB_Zlib                  thrpt    5    15.087 ±   0.116  ops/s
SessionBenchmark.sendReceive60MiB                      thrpt    5     1.013 ±   0.037  ops/s
SessionBenchmark.sendReceive60MiB_Zlib                 thrpt    5     1.358 ±   0.021  ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmLater  thrpt    5     0.169 ±   0.003  ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmNow    thrpt    5     0.225 ±   0.014  ops/s
SessionBenchmark.sendReceiveBatch100ConfirmLater       thrpt    5     1.539 ±   0.093  ops/s
SessionBenchmark.sendReceiveBatch100ConfirmNow         thrpt    5     2.155 ±   0.056  ops/s
SessionBenchmark.sendReceiveBatch100ZlibConfirmNow     thrpt    5     2.120 ±   0.088  ops/s
SessionBenchmark.sendReceiveBatch800ZlibConfirmLater   thrpt    5     0.197 ±   0.003  ops/s

After:

Benchmark                                               Mode  Cnt     Score     Error  Units
ApplicationDataBenchmark.testZlibStreamInOut           thrpt   10     6.088 ±   0.569  ops/s
SessionBenchmark.sendReceive512B                       thrpt    5  1881.041 ± 178.831  ops/s
SessionBenchmark.sendReceive512B_Zlib                  thrpt    5  1868.423 ± 186.251  ops/s
SessionBenchmark.sendReceive512KiB                     thrpt    5    18.013 ±   0.627  ops/s
SessionBenchmark.sendReceive512KiB_Zlib                thrpt    5    16.877 ±   0.652  ops/s
SessionBenchmark.sendReceive5MiB                       thrpt    5    34.717 ±   0.714  ops/s
SessionBenchmark.sendReceive5MiB_Zlib                  thrpt    5    16.922 ±   0.130  ops/s
SessionBenchmark.sendReceive60MiB                      thrpt    5     3.089 ±   0.121  ops/s
SessionBenchmark.sendReceive60MiB_Zlib                 thrpt    5     1.503 ±   0.031  ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmLater  thrpt    5     0.181 ±   0.003  ops/s
SessionBenchmark.sendReceiveBatch1000ZlibConfirmNow    thrpt    5     0.238 ±   0.009  ops/s
SessionBenchmark.sendReceiveBatch100ConfirmLater       thrpt    5     8.246 ±   0.714  ops/s
SessionBenchmark.sendReceiveBatch100ConfirmNow         thrpt    5     8.279 ±   0.657  ops/s
SessionBenchmark.sendReceiveBatch100ZlibConfirmNow     thrpt    5     2.242 ±   0.074  ops/s
SessionBenchmark.sendReceiveBatch800ZlibConfirmLater   thrpt    5     0.226 ±   0.005  ops/s

except for the individual 512B messages, looks like 2x - 4x higher throughput for uncompressed. for ZLib, I think there is still some bottleneck as the increase isn't so pronounced.