Configure input format and split size in extreme_weather.sql

Issue #, if available:

Description of changes:

extreme_weather.sql file in Hive examples has the Hive query that reads 10843 small text data files with header. EMR versions >= 6.6.0 have the support to split text files with header/footer (Ref: HIVE-21924). With this support and With default input format (org.apache.hadoop.hive.ql.io.HiveInputFormat), a single thread in Tez AM reads all the data files during split computation. Therefore, split computation takes ~1.5 hrs for this query in EMR versions >= 6.6.0. Using CombineHiveInputFormat and configuring the split size solves this problem.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

aws-samples / emr-serverless-samples

Configure input format and split size in extreme_weather.sql #30