Closed jeancochrane closed 2 weeks ago
@dfsnow I finished a pretty major refactor of the PR, so I think it's worth taking a fresh look at everything. Instead of the no-filter default being a set of active towns determined by a schedule, the script will now default to exporting reports for all towns, using pandas to speed up the township filtering operation. Testing this locally, it takes about 30 minutes to run an export for all towns, and we scan the same amount of data in Athena since our tables are not partitioned by township anyway.
This PR makes a few quality-of-life improvements to the QC town close export script to enable automation via any distribution mechanism (VM, OneDrive, or S3). The changes include:
Township
class that owns a few different configurations for each township:scripts/utils/town_active_schedule.csv
that can optionally configure the list of availableTownship
s--township
option so that it accepts one or more township codestownship_code
column for the township they are interested in--township
option so that it is optional, and when omitted, falls back to all towns that are active per thetown_active_schedule
--output-dir
option that controls the directory where the script will save reportsWith these changes, we can follow these steps to define an automated process to export QC reports on a regular schedule:
scripts/utils/town_active_schedule.csv
config file on the machine that will run the process (or download it from S3 in an ephemeral environment like a GitHub workflow) with a defined activity scheduleexport_qc_town_close.py
script at regular intervals--township
argument so that the script will export all active towns--output-dir
option on environments like the VM that need to write to specific locations but cannot chain anmv
call to the scheduledexport_qc_town_close.py
call