fullstorydev / hauser

Service for moving your Fullstory export files to a data warehouse
MIT License
48 stars 23 forks source link

Optionally group bundles by day for upload #3

Closed jameremo closed 7 years ago

jameremo commented 7 years ago

This PR adds a new flag called GroupFilesByDay which, when true, will cause Hauser to combine all the bundles for the same day into a single CSV file before loading the file into the warehouse. Grouping the bundles together provides two benefits, particularly when there are many small bundles:

  1. Loading data for many days at once is much faster, as the number of round trips from Hauser to the warehouse is greatly reduced.
  2. For BigQuery specifically, there is a quota of 1000 load jobs per day. It is not uncommon for larger clients to generate bundles every 30 minutes, which means 48 bundles per day. Thus you cannot load more than 20-21 days of data before you reach the limit. Grouping the bundles allows you to load much more data before using up the quota.