CityofSantaMonica / analytics.smgov.net

Website analytics for the City of Santa Monica
https://analytics.smgov.net
Other
6 stars 4 forks source link

Remove newsroom.smgov.net from tracked websites #57

Open thekaveman opened 7 years ago

thekaveman commented 7 years ago

We launched www.santamonica.gov on September 22, which includes the newsroom functionality. On that date, we began redirecting newsroom URLs to the corresponding URLs on the newer site.

In the short-term, we can disable realtime reporting for newsroom.smgov.net.

In the long-term, we can completely remove newsroom.smgov.net. Since our longest reporting period is 90 days, the timeframe here is sometime after December 21, 2017.

thekaveman commented 7 years ago

See also #56

thekaveman commented 7 years ago

@allejo: related to what I mentioned on the closure of #56. When I made ed16d7d and removed an old site, the aggregate WebJob started failing.

Removing the site from the _websites collection removes the key from the Jekyll-generated reports/variables.json file. Subsequent runs of the other WebJobs won't generate new data for the removed site. This is all expected 👍

However, data generated prior to removing the site is never cleaned up. When aggregate runs following the removal, it uses the contents of the data directory (a subdirectory for each agency) and the keys in reports/variables.json; since there is a mismatch, we get the error.

Two options I can think of: either use the keys from reports/variables.json exclusively, or have a separate cleanup WebJob that continuously deletes subdirectories of data that don't exist as keys in reports/variables.json. (I kind of like the former approach better than latter). Your thoughts?

allejo commented 7 years ago

Ahhh that would make a lot of sense... Yea, I'm in favor of using reports/variables.json exclusively in the aggregate WebJob.

As for cleaning up old data, we could have a manual WebJob available to delete any old data that we could run every so often? Or we could tie that WebJob/script to be run on deployment as well.

thekaveman commented 7 years ago

Oh I like the idea of doing a clean on deployment! That plus moving aggregate to key off the reports/variables.json file should solve our current issue with removing sites and prevent stagnant data from sitting around forever.

allejo commented 7 years ago

Should the change go into the feature/aggregate-script-46 branch so that can be revived/merged? Or do it in both branches (rewrite + master).

thekaveman commented 7 years ago

Let's revive that thing and get it merged! I think I was supposed to review your changes, right?

allejo commented 7 years ago

Yea, and I just need to confirm that the generated data is the same as with the current script.