codeforamerica / project-ideas

A place to collect ideas for CfA health projects
41 stars 10 forks source link

SNAP Status Board #43

Open junosuarez opened 10 years ago

junosuarez commented 10 years ago

One sentence description

Yesterday, 10% of SNAP websites across the country were down or inaccessible; let's track and show that.

Link (more details/brain dump/alpha)

something like https://status.github.com/ showing a big red / green for whether the individual state sites are accessible (maybe tracking other metrics, like response time or error rate and maybe with a sparkline showing these metrics over time)

Project Needs (dev/design/resources):

alanjosephwilliams commented 10 years ago

@jden I think this is a fantastic idea. Uptime and availability are the absolute baselines of service. We have a list of all of the sites compiled here over at the CitizenOnboard repo—where we hope to do UX critiques of all the SNAP enrollment process for all 50 states.

migurski commented 10 years ago

:+1: to this. It’d be fascinating to see how deep into the application process the tests could go, down to the level of seeing if any provider has a way to apply with a fake name for a smoke test.

lippytak commented 10 years ago

Completely awesome idea. Feature request: Email mayor/congressmen/activist groups when critical services go down.

daguar commented 9 years ago

Additional benefit of this: if we can show real uptime (downtime) rates for these sites, the old argument of "we need a giant vendor for stable uptime!" loses a bit of its steam.

lippytak commented 9 years ago

^ yes to that. For motivation, the main social services website in CA is currently down (3 hours + so far): error

(http://www.mybenefitscalwin.org/)

alanjosephwilliams commented 9 years ago

I'm down to take on the design and front end work.

status.citizenonboard.com?

On Wed, Nov 19, 2014 at 10:53 AM, Jake Solomon notifications@github.com wrote:

^ yes to that. For motivation, the main social services website in CA is currently down (3 hours + so far): [image: error] https://cloud.githubusercontent.com/assets/2533112/5112128/5734fcd8-6fda-11e4-9779-eb5c12d0739e.png

(http://www.mybenefitscalwin.org/)

— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/project-ideas/issues/43#issuecomment-63692535 .

t: @alanjosephwilli p: 817 713 6264

lippytak commented 9 years ago

Inception sound at CfA office when services go down.

Twitter bot tweet at vendor.

alanjosephwilliams commented 9 years ago

I'm getting started with Uptime now:

http://www.redotheweb.com/uptime/

On Thu, Nov 20, 2014 at 10:40 AM, Jake Solomon notifications@github.com wrote:

Inception sound at CfA office when services goes down.

Twitter bot tweet at vendor.

— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/project-ideas/issues/43#issuecomment-63857142 .

t: @alanjosephwilli p: 817 713 6264

daguar commented 9 years ago

How was "down" determined in the initial survey of sites? HTTP status code? Was there a "sorry we're down for maintenance" note?

Operationalizing "a site is down" in this way will be necessary, so looking for details on what "down" meant in this case.

alanjosephwilliams commented 9 years ago

@daguar manually. Each was different, and yes typically there was a "maintenance" note.

Also here's our first day of MBCW uptime checking. :(

screenshot 2014-11-20 12 08 38

alanjosephwilliams commented 9 years ago

v0.1 here: http://stats.pingdom.com/29wya4enlbs2

Monitors the uptime for all 50 states' food assistance application web service, or the primary page hosting program information and downloadable forms.

We are checking for the presence of certain strings in order to ensure that we are not just getting an error page. However, we encountered some services, like Tennessee's, that are indeed down, but have semi-permanent pages in place offering guidance that the service is unavailable.

lippytak commented 9 years ago

Update on United Status of America (ehh??) x-validation Currently running identical keyword status checks for 8 states + FB, Goog, and Twitter using 3 services:

Pingdom and StatusCake have 1 min check resolution with 30 sec timeout.

UptimeRobot is 5 min check resolution with 120 sec timeout (neither configurable).

So far it looks like there are some pretty big differences between these services...hmmm, glad we're doing validation!

lippytak commented 9 years ago

Another update!

lippytak commented 9 years ago

Idea: Screenshot + Tweet img whenever a site goes down using Websnapr (or something else http://stackoverflow.com/questions/1981670/programmatically-get-a-screenshot-of-a-page)

Mr0grog commented 9 years ago

Well, Websnapr might be a non-standard unless you can turn off caching:

The added benefit is that they have most popular URLs cached, so you will get very fast response times.

But you could probably right something up with Phantom really easily. It's API let's you grab an image.

Mr0grog commented 9 years ago

*non-starter, not non-standard.

Mr0grog commented 9 years ago

Anyway, simple snapshotting service: https://github.com/Mr0grog/PageSnap

(Also running for now, unprotected, at http://pagesnap.herokuapp.com/[url].png, e.g. http://pagesnap.herokuapp.com/http%3A%2F%2Fheroku.com.png to snapshot heroku.com)

lippytak commented 9 years ago

Ahh that's awesome. If you want to keep playing around here I can give you our statuscake API https://www.statuscake.com/api/ key. It basically gives you access to full history of data as shown on the alpha dashboard: http://uptime.statuscake.com/?TestID=pQYuAW4tAi

Maybe a Pinterest gallery of all the SNAP applications home screens that are down?

On Mon, Dec 8, 2014 at 1:59 PM, Rob Brackett notifications@github.com wrote:

Anyway, simple snapshotting service: https://github.com/Mr0grog/PageSnap

(Also running for now, unprotected, at http://pagesnap.herokuapp.com/[url].png, e.g. http://pagesnap.herokuapp.com/http%3A%2F%2Fheroku.com.png to snapshot heroku.com)

— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/project-ideas/issues/43#issuecomment-66195655 .

Mr0grog commented 9 years ago

Hmmm… actually thinking about this now, what’s the value in screenshotting here? I haven’t been following this thread deeply; I mostly just reacted when I saw the note about caching and thought “hey, that’d be easy and quick to do better for our use case.” Does anybody really need to know what “down” looks like? (why?)

Happy to keep poking at this a bit (fair warning, I’m bouncing all around various parts of New England right now, so my time with an internet connection can be limited), but it seems like charting uptime or aggregating statistics or visualizing it better than the StatusCake page would be more useful. This comes back around to @alanjosephwilliams volunteering to design: what are we actually trying to do with this?

Anyway, there may be in-person/office/other-channel conversations I haven’t been in that give more context that I’m missing. But I don’t want to dive into more work here without feeling like I know what we’re actually trying to do. Otherwise I feel like I’m wasting time feeling around in the dark. (Which is not to say I can’t come up with my own ideas and thoughts here; I’m just miles and miles away from really knowing the SNAP space at all and other people on this thread are much more clued in to what would be needed/useful here.)

Mr0grog commented 9 years ago

Sorry if that was a big pile of questions… I don’t think all of them necessarily need to be answered right now, I’m just wary of putting in much more work when I’m not at all sure a) what would be most valuable to do right now and b) where we’re going with this for now.

lippytak commented 9 years ago

NO don't apologize for actually caring about why we are doing this in the first place! Thoughtful questions deserve thoughtful responses so sorry for the delay...I'll offer some initial thoughts now but I'm in Sacramento all day so will think this through in more detail tomorrow.

The Big Hairy Problem here is that a lot of social service websites are down all the time for a bunch of different reasons. So the goal for this project is to increase uptime for critical digital government services and/or reduce the pain of downtime. Our theory of change here is admittedly a bit handwavy but here are a few pieces:

Phew! Longer than I thought. I know this doesn't get us down to the level of specific features but hopefully this gets us closer. More tomorrow and beyond...

Mr0grog commented 9 years ago

Cool, that helps a lot. I ought to have some time to poke at this if you want to send me the StatusCake API key. Maybe I’ll experiment with a few different approaches, though I think I’ll hold off on screenshot-related stuff for the moment.

lippytak commented 9 years ago

@Mr0grog what's your email? (Or follow me and I'll DM you)

Mr0grog commented 9 years ago

rob@robbrackett.com or rob@codeforamerica.org

daguar commented 9 years ago

Updated top to add status + current URL.

lippytak commented 9 years ago

Synthetics (New Relic's monitoring service) has some interesting features like SLA reports: screen shot 2015-01-02 at 11 17 28 pm

...and the ability to write checks with scripted browser via virtualized Selenium browsers.

Definitely too much complexity to start but could be useful in the future.

daguar commented 9 years ago

Super rad — thanks so much for the offer @statuscake!

alanjosephwilliams commented 9 years ago

Just a heads up that this work is being pursued in the CitizenOnboard repo.