Open lippytak opened 9 years ago
I made a google doc here. I'll copy paste to this issue thread.
This is about getting reliable services in the US. It should help people work towards monitoring solutions for services, and they should understand the potential impact of unexpected service interruptions.
Goals for Audiences:
Article Outline:
The meme I've always like with this is this:
"When HealthCare.gov crashed for middle class Americans it was a crisis. But in the social safety net, the status quo is crisis."
Part of an old email I wrote to myself in the middle of the night:
More and more I'm starting to feel that our entire social safety net is failing just like HC.gov, except nobody's watching and those who are failed have no voice. What's worse, it's been failing so badly and for so long that what should be considered an emergency has become the status quo. The fact that California's main benefits website doesn't support mobile AT ALL is an emergency when your target demo doesn't have computers. The fact that veterans in SF jails get released with 4 nights of shelter housing and <$100 cash is an emergency. I watched a CoveredCA webinar last week and jotted down this note: "Far and away, the worst user experience is for those who need it most and are least empowered (MediCal eligibles). This is our opportunity from a social justice/advocacy standpoint. Concretely, there is no coherent way to enroll in MediCal through CoveredCA, and if anything starting with CoveredCA is a long, arduous, confusing detour that just ends with a cold referral to a phone number that begins the process anew. People don’t know what MediCal is, why they are suddenly not allowed to “shop” for plans, and left wondering why they went through this process to begin with." These are emergencies, but there is no urgency anywhere...
I love the phrase "emergencies with no urgency"
On Tue, Jan 27, 2015 at 6:03 PM, Jake Solomon notifications@github.com wrote:
Part of an old email I wrote to myself in the middle of the night:
More and more I'm starting to feel that our entire social safety net is failing just like HC.gov, except nobody's watching and those who are failed have no voice. What's worse, it's been failing so badly and for so long that what should be considered an emergency has become the status quo. The fact that California's main benefits website doesn't support mobile AT ALL is an emergency when your target demo doesn't have computers. The fact that veterans in SF jails get released with 4 nights of shelter housing and <$100 cash is an emergency. I watched a CoveredCA webinar last week and jotted down this note: "Far and away, the worst user experience is for those who need it most and are least empowered (MediCal eligibles). This is our opportunity from a social justice/advocacy standpoint. Concretely, there is no coherent way to enroll in MediCal through CoveredCA, and if anything starting with CoveredCA is a long, arduous, confusing detour that just ends with a cold referral to a phone number that begins the process anew. People don’t know what MediCal is, why they are suddenly not allowed to “shop” for plans, and left wondering why they went through this process to begin with." These are emergencies, but there is no urgency anywhere...
— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/snap-it-up/issues/6#issuecomment-71768303 .
t: @alanjosephwilli p: 817 713 6264
Sorry to butt in (damn GitHub auto subscribe to new repos) but could you use/reuse/inspire Open311status for this? Http://open311status.org ?
I'm also working on a ruby version. Its probably a little more reliable to use a 3rd party service, but ETLing uptime time series data can suck (I wrote that for my day job for platform uptime and it starts to suck at the 500+ site/minute granularity level).
Love this. In medicine the corollary is that our society has decided that we have to intervene if somebody is at risk of dying that day in front of us (Emergency Rooms/EMTALA) but its somehow ok if it happens over years, even if the path dependency is fully acknowledged.
@bengolder is steppin up and steppin in to own this issue :) (aka v1)
:thumbsup:
@bensheldon Sorry I didn’t see any of this thread; I think it happened during the day that I didn’t realize GitHub unsubscribed me from the repo when I transferred it to CfA.
could you use/reuse/inspire Open311status for this? Http://open311status.org ?
I actually suggested this when I first heard about the idea of monitoring SNAP services from @lippytak and @alanjosephwilliams. My understanding is that there were a few reasons they didn’t go that way:
Further, the only reason this repo really came into existence was because, just after I got back from traveling, I was chatting with @alanjosephwilliams, who noted it would be nice just to have a simple map showing what service portals were down. Incredibly straightforward and obvious. So I did that :)
Relevant: http://www.cbpp.org/cms/?fa=view&id=618
Two notes that may be worth covering in any writeup:
Was looking at the now-autogenerated screenshots and noticed this gem from New Mexico:
Where their upgrade is not going to include any applications that have been started but not submitted. The description also makes it sound like there’s a two-day window before the upgrade when actual, submitted applications won’t be valid, either. Oof.
This is what deployments look like:
:sob:
dear lord on both fronts
On Tue, Feb 17, 2015 at 11:34 PM, Rob Brackett notifications@github.com wrote:
[image: :sob:]
— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/snap-it-up/issues/6#issuecomment-74824101 .
t: @alanjosephwilli p: 817 713 6264
We could call it One Nines (http://en.wikipedia.org/wiki/High_availability)
"Your website shouldn't have more days off than you do."
Your website shouldn't have more days off than you do.
That is a fantastic line.
Where can I find screenshots from downtime?
They live here: https://s3-us-west-2.amazonaws.com/snap-snapshots/
The state-pages
branch (in PR #29) displays snapshots on the state pages.
Document of anecdotes, assets, and more-meat-on-the-bones version of Ben's outline in progress here: https://docs.google.com/document/d/1JJYceEpF7_eYAr33SdsD4xawoHeNnBZsrnuxnvRpQTw/edit
@bengolder FYI, I just cleared out all the broken snapshots on S3.
How downtime reporting works now:
That gets embedded in an email ^. Then the email gets forwarded around. Looking at email timestamps in this particular case it didn't reach the end user (Liliana) until ~3pm, when the outages started at ~2am according to Pingometer. About a ~12 hour lag...
With no estimated time to resolve
On Saturday, March 21, 2015, Jake Solomon notifications@github.com wrote:
How downtime reporting works now: [image: screen shot 2015-03-21 at 2 10 45 pm] https://cloud.githubusercontent.com/assets/2533112/6766852/1e0c245e-cfd4-11e4-8ba9-d193d9d3af4a.png
That gets embedded in an email ^. Then the email gets forwarded around. Looking at email timestamps in this particular case it didn't reach the end user (Liliana) until ~3pm, when the outages started at ~2am according to Pingometer. About a ~12 hour lag...
— Reply to this email directly or view it on GitHub https://github.com/codeforamerica/snap-it-up/issues/6#issuecomment-84453923 .
Rebecca Coelius, MD Director of Health Code for America 415 298 2872 LinkedIn http://www.linkedin.com/in/rebeccacoelius Twitter http://twitter.com/#!/RebeccaCoelius
Hah good point @RebeccaCoelius . Plain language summary of above email: "We don't know what's wrong or when we'll fix it or when we'll be in touch next and we are not sorry for any inconvenience."
Externalizing this work through an awesome article remains a key tactic to capture the value of this work. I've reassigned to myself for the time being, but we should divide labor appropriately when we discuss the scope of outstanding work necessary to realize this goal.
I should document here (as I have elsewhere) though that the work to date made our presentation of this story in talks at HealthRefactored and Etsy a big success. So much tremendous work has been done by this group. Its awesome.
@bengolder let's start dropping notes here and build out an outline.