This PR adds a new flag called GroupFilesByDay which, when true, will cause Hauser to combine all the bundles for the same day into a single CSV file before loading the file into the warehouse. Grouping the bundles together provides two benefits, particularly when there are many small bundles:
Loading data for many days at once is much faster, as the number of round trips from Hauser to the warehouse is greatly reduced.
For BigQuery specifically, there is a quota of 1000 load jobs per day. It is not uncommon for larger clients to generate bundles every 30 minutes, which means 48 bundles per day. Thus you cannot load more than 20-21 days of data before you reach the limit. Grouping the bundles allows you to load much more data before using up the quota.
This PR adds a new flag called
GroupFilesByDay
which, when true, will cause Hauser to combine all the bundles for the same day into a single CSV file before loading the file into the warehouse. Grouping the bundles together provides two benefits, particularly when there are many small bundles: