amazon-ion / ion-java

Java streaming parser/serializer for Ion.
https://amazon-ion.github.io/ion-docs/
Apache License 2.0
863 stars 110 forks source link

Ion 1.1 managed writer should flush and reset buffers when a symbol table is manually written #873

Open tgregg opened 1 month ago

tgregg commented 1 month ago

When the user (or, more likely, a system reader via writeValues) writes a symbol table or IVM, the writer should flush any previously buffered data to the output. This is a performance optimization, and should allow parity with Ion 1.0. Here's some data to support that, produced using the ion-java-benchmark-cli, which writes data using a system reader provided to IonWriter.writeValues.

Ion 1.0: 194 ms / op

Ion 1.1 - all symbols interned 710 ms / op

Ion 1.1 - all symbols inline 234 ms / op

Ion 1.1 - all symbols interned - forced flush every 500 values 175 ms / op

Ion 1.1 - all symbols inline - forced flush every 500 values 229 ms / op

tgregg commented 1 month ago

Update: we should still make sure this is being done, but it may not be the cause of the performance discrepancies noted. The benchmark CLI isn't currently doing a system write during the timed portion of the write benchmark; only during the one-time conversion performed before generating the write instructions or for read benchmarks that provide a different format as input. More investigation needed.