Open bensheldon opened 9 years ago
Re-implement Monitors, Incidents and MonitorEvents data model in Postgres
And Snapshots
.
Can we point Pingometer to a second webhook?
Yep. In Pingometer’s UI, go to Alerts > Contacts and add a new contact for the webhook. Then in Alerts > Groups, add that contact to the “hooks” group (that group is subscribed to all the monitors).
I haven't gone through the Rake tasks fully. Is there any functionality in those that is being actively used?
snapshot_states.rb
(not a rake task) is actively used in production. The other three are actively used in development, but just one-time use in production (unless you need to clear out data because of corruption and rebuild it). The live database isn’t really canonical storage for anything.
snapshot_states.rb (not a rake task) is actively used in production
If i'm understanding that task correctly, it takes screenshots of all of the monitored sites separate/distinct from the screenshot that's taken when an incident/event takes place. Does that run on a cron, or do you just trigger it manually?
I currently have screenshots as part of Events, but if there is also the necessity for on-demand-screenshots, I'll separate that out into a polymorphic model that can be attached to Monitors and Events both.
As some context for how I'm architecting it, I have a rake task that fetches all the monitors and their data and syncs/stores them in the database. I plan on running this once an hour (?). And Events that are triggered by the webhook, and Incidents that join together the monitors. I named those models WebService
, MonitorEvent
and MonitorIncident
respectively, though we could always rename them for clarity. The WebService
also exposes the static State metadata too (timezone, etc). Maybe WebService should just be renamed as Monitor
, but that's pretty easy search-replace when the time comes :smile:
If i'm understanding that task correctly, it takes screenshots of all of the monitored sites separate/distinct from the screenshot that's taken when an incident/event takes place.
Yep.
Does that run on a cron, or do you just trigger it manually?
Cron. 3x/night at 1am, 2am, 3am (in the time zone of the site being monitored).
I named those models WebService, MonitorEvent and MonitorIncident respectively, though we could always rename them for clarity.
I originally named the MonitorXXX
ones that way because they are directly reflecting the monitor data in Pingometer (maybe Pingometer
would have been a better prefix, but I think I was trying to at least sort of keep it monitoring-service-agnostic). So Incident
isn’t prefixed because it’s not directly reflective of data coming from an outside service. Not saying we have to keep with that, but I think it’s useful to signify that somehow.
I’m not super-keen on WebService
, since that name says API to me. That said, I struggle to think of something better.
As a side note, I don’t think it makes sense to have a raw_monitor_data
field. First, is that raw data from Pingometer or the stuff that’s currently in data/pingometer_monitors.json
? The schema/contents for those things are very different (e.g. our local metadata has a hostname
; pingometer’s data may or may not; for some monitors it needs to be synthesized from the commands). There’s also metadata we store locally that simply does not exist in any fashion in Pingometer (state, time zone(s)). I think the model for web services should have that info directly accessible (though I don’t think we need to redundantly store three forms of state, like we do now).
As a side note, I don’t think it makes sense to have a raw_monitor_data field. First, is that raw data from Pingometer
That's raw monitor data from Pingometer. It's just dumped in as JSON into the database because I didn't entirely know what was in it, but I also wanted to cache it so that I could construct the URL being monitored.
I'm not planning to include the pingometer_monitors.json
data in the database (at least not yet). Instead I'll just load that file into memory and then provide an accessor for it in the individual models (using the hostname, as you suggested, as the join ID between the Monitor and the additional info).
Maybe I should rename WebService to just Monitor
since it seems like it's a 1-1 mapping with what's in Pingometer. I'm thinking maybe I could namespace them all under Pingometer (Pingometer::Monitor, Pingometer::Event), but it honestly doesn't matter to much I don't think :-)
@bensheldon @Mr0grog let's discuss where this effort stands synchronously, as part of a larger discussion about reasonable goals for this project going forward. Since this thread was last updated, we've uncovered (and articulated how to address) concerns about the fidelity of our data, drafted and given presentations about the work, and set out a plan for a published write up of the work.
@alanjosephwilliams @Mr0grog Would love to re-address this. I freezered the Rails work because I didn't want to distract from the immediate presentation needs. I think we have a pretty good spec now for how to manage data fidelity and accuracy (ETLing event data as canonical and using webhooks for more temporal updates), as well as adding bad-data periods for hiding bad data.
If there is a continuing need for this data, I will restart development on the Rails branch.
I think we've identified some tech (postgres, background jobs) and process benefits (me helping) to changing the architecture. This is just a brief summary of what I'm planning to do this weekend:
I think that functionality is sufficient to deploy alongside the Sinatra app (snap-status-rails.herokuapp.com ... until it has full partiy). Can we point Pingometer to a second webhook?
Once I have that up and ready to catch webhooks, I'll work on pulling in the existing front-end reports.
I haven't gone through the Rake tasks fully. Is there any functionality in those that is being actively used?
Please don't let my re-architecting block any feature work. I take full responsibility for backporting any work on the Sinatra app until the Rails piece has full parity. But I suggest we put any of the backend or code cleanup improvements on hold.