Snapshots triggered by event hook should happen in a worker thread/process/dyno

Mr0grog commented 9 years ago

Now that @bensheldon’s set up Sentry, I’m getting occasional notices about the event hook timing out. I remember seeing this occasionally in logs in the past, as well. I’m 99.9% certain this is caused by sites being slow to load when we screenshot them (not exactly surprising if the site reports as “down”).

In order to make sure we give the snapshot more time to complete and don’t cause errors, we should probably que the snapshots and perform them in some sort of worker process or thread. @bensheldon suggests using Que.

bensheldon commented 9 years ago

I realized that the reason I really like que is that it only uses Postgres without an additional redis dependency. But with that I fealized also that we're using mongo....

Which makes me wonder if it would simplify moving forward to just throw this project into Rails at this point and use Postgres instead of mongo. We'd get a bit more structure, some free tooling, andh onestly not much more overhead if we're now talking about setting up background jobs too.

Mr0grog commented 9 years ago

So, not to sound pissy, but I feel like I need to lay this out: the point of this thing (and the reason I started working on it) is just a few key things:

Gather data (stays, screenshots, whatever's relevant) on the uptime and availability of SNAP Web services.
Visualize and analyze that data.
Explain to people who aren't aware of the problems what they actually are and why they're a big deal.

I feel like we are accomplishing (1) and @bengolder is focusing on (3). The thing I want to be working on is (2), but between my very limited time and everyone wanting the code in this repo cleaner (it's messy because I just want to get to 2!), that isn't happening. At this point, I have a drastically diminishing appetite for architectural work (even though I agree; the architecture here is terrible). It also seems insane to me that this task requires all of Rails, but I am also not at all a Ruby person and don't know all the tools well.

So. I'm not remotely excited to do that kind of refactor. But if you want to do this (and related stuff), I would ABSOLUTELY LOVE it, because this should be better.

bensheldon commented 9 years ago

I hear you! Screw cleanliness, it seemed like a big focus, but if it's not in service to creating value, it's just polishing useless blocks of wood.

I'm happy to help separate out the issues into "yak waxing" and feature work. Cause I want to architecture discussions to be in service to feature work and not be a distraction from them.

On Fri, Mar 13, 2015 at 9:24 AM, Rob Brackett notifications@github.com wrote:

So, not to sound pissy, but I feel like I need to lay this out: the point of this thing (and the reason I started working on it) is just a few key things:

Gather data (stays, screenshots, whatever's relevant) on the uptime and availability of SNAP Web services.

Visualize and analyze that data.

Explain to people who aren't aware of the problems what they actually are and why they're a big deal. I feel like we are accomplishing (1) and @bengolder is focusing on (3). The thing I want to be working on is (2), but between my very limited time and everyone wanting the code in this repo cleaner (it's messy because I just want to get to 2!), that isn't happening. At this point, I have a drastically diminishing appetite for architectural work (even though I agree; the architecture here is terrible). It also seems insane to me that this task requires all of Rails, but I am also not at all a Ruby person and don't know all the tools well.

So. I'm not remotely excited to do that kind of refactor. But if you want to do this (and related stuff), I would ABSOLUTELY LOVE it, because this should be better.

Reply to this email directly or view it on GitHub: https://github.com/codeforamerica/snap-it-up/issues/39#issuecomment-79098099

alanjosephwilliams commented 9 years ago

follow your heart toward data viz @Mr0grog!

After we tell the story we have, we'll either be able to get support to wax this block of wood, or we won't.

On Fri, Mar 13, 2015 at 9:29 AM, Ben Sheldon notifications@github.com wrote:

I hear you! Screw cleanliness, it seemed like a big focus, but if it's not in service to creating value, it's just polishing useless blocks of wood.

I'm happy to help separate out the issues into "yak waxing" and feature work. Cause I want to architecture discussions to be in service to feature work and not be a distraction from them.

On Fri, Mar 13, 2015 at 9:24 AM, Rob Brackett notifications@github.com wrote:

So, not to sound pissy, but I feel like I need to lay this out: the point of this thing (and the reason I started working on it) is just a few key things:

Gather data (stays, screenshots, whatever's relevant) on the uptime and availability of SNAP Web services.

Visualize and analyze that data.

Explain to people who aren't aware of the problems what they actually are and why they're a big deal. I feel like we are accomplishing (1) and @bengolder is focusing on (3). The thing I want to be working on is (2), but between my very limited time and everyone wanting the code in this repo cleaner (it's messy because I just want to get to 2!), that isn't happening. At this point, I have a drastically diminishing appetite for architectural work (even though I agree; the architecture here is terrible). It also seems insane to me that this task requires all of Rails, but I am also not at all a Ruby person and don't know all the tools well. So. I'm not remotely excited to do that kind of refactor. But if you want to do this (and related stuff), I would ABSOLUTELY LOVE it, because

this should be better.

Reply to this email directly or view it on GitHub:

https://github.com/codeforamerica/snap-it-up/issues/39#issuecomment-79098099

— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/snap-it-up/issues/39#issuecomment-79103228 .

t: @alanjosephwilli p: 817 713 6264

Mr0grog commented 9 years ago

Thanks for understanding my above emotional discharge :P

The big thing here is just data reliability and robustness (i.e. (1) above). It doesn’t look like we are timing out on screenshots very often, but we certainly are on some, and it would be good not to have that happening (or at least giving it more than Heroku’s 30 seconds). We probably don’t need a big refactor to do that. I’d really appreciate it if you want to take this and work on it, @bensheldon.

In general, actually, It’d be utterly fantastic if you have the time and interest in owning data reliability and robustness. Any visual or written analysis is worthless if based on unreliable data, so if this could be better in that area, it’s important.

If you’d like to do that, just say the word. Seriously, it’d make my day and let me focus on finding interesting ways to look at the data and present it.

alanjosephwilliams commented 9 years ago

also @bensheldon, not sure if you are up to date on our rollout plan. We're looking to do a few things:

Author a narrative that illustrates the impact of downtime through some real stories we have, and gives a high-level overview of the work we've done. This is being owned by @bengolder, as Rob mentioned.
Author a technical write up for government and civic tech folks looking to implement monitoring on services they care about—yaks included. Not sure who owns this, but you, rob and @lippytak should all be contributors, probably.
Both of those content pieces would hopefully include some of the visualization, which @mr0grog owns.
Present the story at a health & technology conference in Boston on April 1, which I own. We'll probably want to publish the two pieces of content at the same time, if possible. There is also some art direction and presentation work that will be done for those content pieces, which I'm talking to @mollymcleod and @davidleonard about.

On Fri, Mar 13, 2015 at 10:16 AM, Rob Brackett notifications@github.com wrote:

Thanks for understanding my above emotional discharge :P

The big thing here is just data reliability and robustness (i.e. (1) above). It doesn’t look like we are timing out on screenshots very often, but we certainly are on some, and it would be good not to have that happening (or at least giving it more than Heroku’s 30 seconds). We probably don’t need a big refactor to do that. I’d really appreciate it if you want to take this and work on it, @bensheldon

https://github.com/bensheldon.

In general, actually, It’d be utterly fantastic if you have the time and interest in owning data reliability and robustness. Any visual or written analysis is worthless if based on unreliable data, so if this could be better in that area, it’s important.

If you’d like to do that, just say the word. Seriously, it’d make my day and let me focus on finding interesting ways to look at the data and present it.

— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/snap-it-up/issues/39#issuecomment-79150017 .

t: @alanjosephwilli p: 817 713 6264

alanjosephwilliams commented 9 years ago

I forgot the most important part of that comment: please involve yourself in any of those pieces as you are so inclined!

bensheldon commented 9 years ago

Do we have a sense right now of whether this app will be a part of the collateral/outcome of this project, or is it still a means to just collecting the data for some other form of delivery? If I commit to maintaining the backend services, are we thinking this is just gonna be a 6 month project and then we kill it, or will it be a platform for further work and public consumption? And if you're not sure, where would you say your current thinking is on a scale of 1 (throw it away) to 10 (future platform)?

@Mr0grog I commit to owning the backend services (infrastructure, api, platform, reliability) in service of front-end visualizations and data analysis "product". Can we start elevating our thinking around what you need to create that analysis/data-viz separate from backend services?

Mr0grog commented 9 years ago

Author a technical write up for government and civic tech folks looking to implement monitoring on services they care about—yaks included.

Oh, yeah, @bensheldon you should definitely collaborate with @lippytak on this, since your work on Open311Status is really relevant. This started from a separate base because the needs are slightly different, but both are very valid as patterns in the larger “cheap, light, outside monitoring for gov’t web services” discussion.

Mr0grog commented 9 years ago

where would you say your current thinking is on a scale of 1 (throw it away) to 10 (future platform)?

My perspective:

In terms of re-use for other purposes, I don’t think I care or would want to put too much emphasis on the code here—I’d see it more as a pattern than a concrete, reusable object (though there may be a bigger project combining general ideas from Open311Status and this that could be built).
In terms of how long this lives, I would like to see it continue on for a while. The near-term goal is awareness/presentation, where the software is just a research tool for a narrative. However, longer term goals we’ve talked about (but not necessarily committed to) include this being (a) activism, a way to continually hold SNAP service implementors/owners accountable, (b) a way for people to see site status and avoid issues like the sign-up drive that went south when it turned out the site was down, and (c) a repository of ongoing data about the issue that is freely provided to others to do their own analysis, investigation, or visualization. To be clear, though, those goals are all subservient to the near term write-up/presentation goal. They depend on interest from people who’d support them.

alanjosephwilliams commented 9 years ago

are we thinking this is just gonna be a 6 month project and then we kill it, or will it be a platform for further work and public consumption? And if you're not sure, where would you say your current thinking is on a scale of 1 (throw it away) to 10 (future platform)?

+1 to @Mr0grog. I was about to say:

We aren't sure. I think it depends on our ability to secure support, which I think is in turn dependent on how effectively we present the work and thinking to date. However, we have definitely thought about what 10 looks like—which would be something akin to "a sufficiently funded team dedicated monitoring public services broadly".

Mr0grog commented 9 years ago

TL;DR I’d call it a 4 with naive hopes of 10. (A solid 10 would almost certainly, I think, be a new codebase with some deeper technical thinking.)

Mr0grog commented 9 years ago

Also: THANKS, BEN!

Can we start elevating our thinking around what you need to create that analysis/data-viz separate from backend services?

YES. Branched this off into #40.

codeforamerica / snap-it-up

Snapshots triggered by event hook should happen in a worker thread/process/dyno #39

So. I'm not remotely excited to do that kind of refactor. But if you want to do this (and related stuff), I would ABSOLUTELY LOVE it, because this should be better.

this should be better.

https://github.com/bensheldon.