apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.06k stars 445 forks source link

Call hsync(), if supported, immediately prior to closing DFS compaction files. #4455

Open keith-turner opened 4 months ago

keith-turner commented 4 months ago

According to this documentation its best to call hsync() prior to close. This could possibly be done for files created for minor and major compaction in Accumulo. For compactions could call StreamCapabilities to see if hsync() is supported and if its then call it immediately before calling close.

cshannon commented 4 months ago

I would be cautious with a change like this without some benchmarking/testing and probably a way to configure or turn it off.

Fsync operations are generally slow and can easily impact IO performance. Some systems like Kafka don't recommend using it and instead relying on replication for durability. Kafka gets its performance from using page cache and letting the OS flush to disk async and the durability comes from the fact that producers will get acks from more than one system that a message was received. Kafka (and also ActiveMQ) support setting a flush interval so you can flush periodically on a timer to tune how much you are syncing.

Hadoop is obviously a bit different and I'm not sure what durability guarantees it provides with data replication without calling hsync but it is worth looking into to see if this is necessary. I ultimately think we need to research this and see if it's really a good idea or providing enough benefit before we start calling hsync(). It may be the case that performance is fine if only calling it once per file but it's hard to say without testing. And lastly, as I said, it would be good to make this configurable to be disabled I think

keith-turner commented 4 months ago

some benchmarking/testing and probably a way to configure or turn it off.

We could consider using the metadata table write ahead log sync settings to control this. For example if the metadata table is configured to use hsync on its wal updates, then could use hsync on the files being added in that case. That would look like the following.

  1. Metadata table has hsync set for wal updates
  2. File A is created for compaction
  3. hsync() is called on File A because it set on metadata table
  4. File A is closed
  5. reference to File A is written to metadata table and that mutation is written to walog
  6. hsync() is called on walog

In the above the file has the same durability guarantee as the reference to the file. If hsync is not called when adding the file reference we could omit calling it when closing the file, like the following.

  1. Metadata table does not have hsync set for wal updates
  2. File A is created for compaction
  3. File A is closed
  4. reference to File A is written to metadata table and that mutation is written to walog