CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Simplify compaction stages #71

Closed alexjbush closed 5 years ago

alexjbush commented 5 years ago

Investigate whether there is really a need for hot -> cold and cold -> cold compactions in the storage layer, or all the hot plus cold under threshold can be compacted in a single go.

This would reduce complexity, and reduce the addition round of IOPs and Spark stage.

https://github.com/CoxAutomotiveDataSolutions/waimak/blob/develop/waimak-storage/src/main/scala/com/coxautodata/waimak/storage/AuditTableFile.scala#L104