Closed theferrit32 closed 2 years ago
Increasing the size of the ssd persistent disk from 50GiB to 100GiB also had a benefit roughly doubling the observed messages per second write speed. This crossed one of the speed thresholds in google cloud (64GiB) so that makes sense: https://cloud.google.com/compute/docs/disks/performance#performance_by_disk_size
Will check in current state of the code and then evaluate PreparedStatement/addBatch
https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/PreparedStatement.html#addBatch()
Statement/addBatch
accepts new sql statement strings in each call, but since our stream messages are well grouped by entity type, the PreparedStatement/addBatch
will work fine, reusing the same insert sql on each call until the next entity type is seen.
I implemented an batch insert query builder in the combiner which iterates over the messages from kafka, partitioned by release sentinels, partitioned again by table, partitioned again by a batch size of 500, and it increased the write performance to sqlite from ~400 to ~910 messages per second.
sqlite does have a default maximum query size which means there needs to be a numeric limit to the number of messages included in a batch insert. However it may be possible to use the
java.sql.PreparedStatement/addBatch
functionality to achieve similar i/o throughput improvement from batching writes.