Closed JamesSLogan closed 1 week ago
Thanks for the feedback @ian-r-rose , it is now implemented. I believe the following manual actions will need to take place post-merge:
Thanks for the feedback @ian-r-rose , it is now implemented. I believe the following manual actions will need to take place post-merge:
1. Edit nightly build schedule (I can do this) 2. Deploy dag to airflow (Ian?) 3. Re-run dag for "yesterday" since we will be skipping a day when this is merged/deployed (Ian?)
Yep, agreed on all points! I just need to remember how to deploy the dag :)
Update here @JamesSLogan, this is now deployed to Airflow, and we are now caught up (data from this morning is in Snowflake!)
Also, I made a mistake above. This script doesn't take 60 minutes, it takes 60 seconds. So actually I think it would be quite safe to schedule the "nightly" job for 6:30 AM.
Awesome, thank you! I re-updated the dbt job for 6:30. 🤞 for tomorrow's run
Fixes #455 ^mostly fixes, at least. This change puts the s3 loading 2 hours prior to our nightly build, which will decrease daily lag by 1 day. Specifically, lag will go from 3 to 2 days. To get us to 1 day, we would need to load data as soon as it becomes available in clearinghouse.
Currently, "yesterday's" data arrives during the hours of 4:00 to 5:30 AM the next day. To schedule the current process of loading to s3, we would want to start it ~6-7 to be safe. This would lead to kicking off the nightly build around 7-8 AM, which is arguably too late to guarantee data availability before people start work, some days, at least. I think it's worth it to keep the current implementation, especially considering that the data relay server should improve this latency in the future.
@mmmiah fyi