Shopify / camus

Kafka->HDFS pipeline from LInkedIn. It is a mapreduce job that does distributed data loads out of Kafka.
7 stars 4 forks source link

Add script to create Hive partitions for Camus dropped data #69

Closed drdee closed 8 years ago

drdee commented 8 years ago

I took the Python script from https://github.com/wikimedia/kraken/tree/master/kraken-etl and made some adjustments including creating the Hive table if it does not exist and more logging when enabling verbose mode.

The logic of the script is quite straightforward: 1) Specify a comma separated list of table names (=Trekkie topics) for which you want partitions to be added. 2) If the table does not yet exist, create the table in the Hive metastore. I want to put the file that contains the table schema in Chef and not in this public repo. 3) Then, run MSCK REPAIR TABLE to detect for which folders, partitions are missing. 4) Parse that stuff and generate an ALTER TABLE statement that adds those partitions.

@dterror @honkfestival

honkfestival commented 8 years ago

🚀

dterror-zz commented 8 years ago

👍