Open thekaveman opened 7 years ago
See also #56
@allejo: related to what I mentioned on the closure of #56. When I made ed16d7d and removed an old site, the aggregate WebJob started failing.
Removing the site from the _websites
collection removes the key from the Jekyll-generated reports/variables.json
file. Subsequent runs of the other WebJobs won't generate new data for the removed site. This is all expected 👍
However, data generated prior to removing the site is never cleaned up. When aggregate runs following the removal, it uses the contents of the data
directory (a subdirectory for each agency) and the keys in reports/variables.json
; since there is a mismatch, we get the error.
Two options I can think of: either use the keys from reports/variables.json
exclusively, or have a separate cleanup WebJob that continuously deletes subdirectories of data
that don't exist as keys in reports/variables.json
. (I kind of like the former approach better than latter). Your thoughts?
Ahhh that would make a lot of sense... Yea, I'm in favor of using reports/variables.json
exclusively in the aggregate WebJob.
As for cleaning up old data, we could have a manual WebJob available to delete any old data that we could run every so often? Or we could tie that WebJob/script to be run on deployment as well.
Oh I like the idea of doing a clean on deployment! That plus moving aggregate to key off the reports/variables.json
file should solve our current issue with removing sites and prevent stagnant data from sitting around forever.
Should the change go into the feature/aggregate-script-46
branch so that can be revived/merged? Or do it in both branches (rewrite + master).
Let's revive that thing and get it merged! I think I was supposed to review your changes, right?
Yea, and I just need to confirm that the generated data is the same as with the current script.
We launched www.santamonica.gov on September 22, which includes the newsroom functionality. On that date, we began redirecting newsroom URLs to the corresponding URLs on the newer site.
In the short-term, we can disable realtime reporting for newsroom.smgov.net.
In the long-term, we can completely remove newsroom.smgov.net. Since our longest reporting period is 90 days, the timeframe here is sometime after December 21, 2017.