ets / tap-spreadsheets-anywhere

GNU Affero General Public License v3.0
31 stars 63 forks source link

:sparkles: Add function to evaluate modified_since criteria at a row level #72

Closed nicholasvk closed 1 year ago

nicholasvk commented 1 year ago

Add new functionality to the tap-spreadsheets tap so that we can evaluate the max date from each s3 inventory report and only import those files into the warehouse. The update leverages the existing tap-spreadsheets file timestamp evaluation so for files that have been written to the scrape bucket since the last ELT run, we will then evaluate each of those files, get the max date within each file in the last_modified_date column, and if the max date for this column is also greater than the last ELT run we will process the file. While it will still take time to loop through and evaluate the newly created s3 inventory reports this should help streamline our process and avoid needlessly upserting inventory reports on a regular basis for buckets where there is no activity.