The logic of the script is quite straightforward:
1) Specify a comma separated list of table names (=Trekkie topics) for which you want partitions to be added.
2) If the table does not yet exist, create the table in the Hive metastore. I want to put the file that contains the table schema in Chef and not in this public repo.
3) Then, run MSCK REPAIR TABLE to detect for which folders, partitions are missing.
4) Parse that stuff and generate an ALTER TABLE statement that adds those partitions.
I took the Python script from https://github.com/wikimedia/kraken/tree/master/kraken-etl and made some adjustments including creating the Hive table if it does not exist and more logging when enabling verbose mode.
The logic of the script is quite straightforward: 1) Specify a comma separated list of table names (=Trekkie topics) for which you want partitions to be added. 2) If the table does not yet exist, create the table in the Hive metastore. I want to put the file that contains the table schema in Chef and not in this public repo. 3) Then, run
MSCK REPAIR TABLE
to detect for which folders, partitions are missing. 4) Parse that stuff and generate an ALTER TABLE statement that adds those partitions.@dterror @honkfestival