CoxAutomotiveDataSolutions / waimak

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
Apache License 2.0
75 stars 16 forks source link

Clean-up strategy should error if no folders exist after committing #61

Open alexjbush opened 5 years ago

alexjbush commented 5 years ago

It is currently hard to track when configuration for the clean-up strategy function is incorrect resulting in untracked data growth.

Expected Behavior

Exception should be thrown if configuration for clean-up is incorrect. This can be done by ensuring at least one snapshot folder exists. This should be safe as clean-up is done directly after a snapshot is made.

Actual Behavior

If the configuration is incorrect, there is no warning and therefore data volume grows unchecked.

Steps to Reproduce the Problem

  1. Specify an incorrect folder structure for the withDateBasedSnapshotCleanup
  2. Run job
  3. Check to see if folders are cleaned up

Specifications