CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Consider average row size for compaction and fix recompactAll behaviour #59

Closed alexjbush closed 5 years ago

alexjbush commented 5 years ago

Description

This PR introduces a generic way of calculating the number of partitions to use when generating parquet files during a compaction in the storage layer.

There are two implementations to use:

Also fixes/changes the behaviour of the recompactAll flag to now force a recompaction regardless of whether we are in a compaction window or now.

Fixes #32

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Unit tests, should test on data during release branch

coveralls commented 5 years ago

Pull Request Test Coverage Report for Build 437


Changes Missing Coverage Covered Lines Changed/Added Lines %
waimak-storage/src/main/scala/com/coxautodata/waimak/storage/StorageActions.scala 26 27 96.3%
<!-- Total: 31 32 96.88% -->
Files with Coverage Reduction New Missed Lines %
waimak-storage/src/main/scala/com/coxautodata/waimak/storage/StorageActions.scala 1 93.75%
waimak-storage/src/main/scala/com/coxautodata/waimak/storage/AuditTableFile.scala 1 96.49%
waimak-rdbm-ingestion/src/main/scala/com/coxautodata/waimak/rdbm/ingestion/RDBMIngestionUtils.scala 1 93.75%
waimak-storage/src/main/scala/com/coxautodata/waimak/storage/Storage.scala 1 87.5%
waimak-storage/src/main/scala/com/coxautodata/waimak/storage/FileStorageOps.scala 2 90.77%
<!-- Total: 6 -->
Totals Coverage Status
Change from base Build 422: 0.1%
Covered Lines: 1159
Relevant Lines: 1457

💛 - Coveralls