Open dportnoy opened 8 years ago
Sure I will update the network_tier. As for telemedicine. Only one plan provided it (all false). I can create a benefit file but it would be short and boring....
On Apr 19, 2016, at 3:40 PM, David X Portnoy notifications@github.com wrote:
@BAP-Jeff, on #56 (comment)
==> PlanOut.pip <== plan_id|marketing_name|summary_url|marketing_url|formulary_url|plan_contact|last_updated_on
==> PlanNetworkOut.pip <== plan_id|network
==> formularyOut.pip <== plan_id|drug_tier|mail_order
==> costOut.pip <== plan_id|drug_tier|pharmacy_type|copay_amount|copay_opt|coinsurance_rate|coninsurance_opt
Looks good! Let's load. A couple observations for future improvement:
Should the field in PlanNetworkOut.pip be network_tier? Consider including telemedicine field — You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
@BAP-Jeff, Sounds good. No need to create telemedicine if there's no data.
Since there's not time for slides... From your prior analysis of the data, do you recall any observations that might be interesting to discuss?
@dportnoy, I am finishing up the plan stuff and unfortunately a few big issuers supplied malformed json documents...I am guessing they rely on the same vendor to produce these so that is why I see the same mistake a few different places.
In a nutshell:
"network": [ {"network_tier": "STATEWIDE" }], "formulary": {"drug_tier": "HIGHEST-COST-SHARE", "mail_order": true, "cost_sharing": [ {"pharmacy_type": "1-MONTH-IN-MAIL", "copay_amount": 0.00, "copay_opt": "NO-CHARGE-AFTER-DEDUCTIBLE",.........
They are missing the [ bracket after formulary. Oh well.
So first observation is validate your json files! I'll come back to you with some observations, but let me get the files squared away first...
Here is the full set of plans.json files.
I'll get the formulary files up later together along with observations.
@dportnoy I "hearted" a comment from earlier today. Can you take a look to see if you still see an issue? I am holding on producing the final files until I hear.
I will add in issuer and marketplace from the old PUF file at that time too.
@BAP-Jeff, Nice job on plans.zip! These files are pleasantly small! We don't even need separate sample files. I'm assuming they will be eclipsed by PROVIDERS files.
Row counts 22,295 cost.pip 36,226 formulary.pip 13,969 network.pip 12,233 plan.pip
Some unexpected line breaks in plan.pip might cause problems with loading. For example:
@BAP-Jeff,
I "hearted" a comment from earlier today. Can you take a look to see if you still see an issue? I am holding on producing the final files until I hear.
Is it this one: https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211861941
Regarding your questions:
- I just double checked ServiceAreaID and FormularyID. I don't see them as being equal. The third character is an "S" for ServiceArea or "F" for Formulary. These items comes from the Plan Attributes PUF
A: Sure. Keep both.
- I will change the naming. I was trying to be consistent with the actual Data Dictionaries on the PUF site
A: Already done.
- Which PUF file has the marketplace_category and issuer_name?
A: See https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211976114 for answer
Uh oh let me look.
Is someone getting the physician file done for you?
On Apr 19, 2016, at 5:47 PM, David X Portnoy notifications@github.com wrote:
@BAP-Jeff, Nice job on plans.zip! These files are pleasantly small! We don't even need separate sample files. I'm assuming they will be eclipsed by PROVIDERS files.
Row counts 22,295 cost.pip 36,226 formulary.pip 13,969 network.pip 12,233 plan.pip
Some unexpected line breaks in plan.pip might cause problems with loading. For example:
31609PA0150004 16204OH0020010 — You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
Regarding line breaks...oh fun check out the source:
"plan_id_type" : "HIOS-PLAN-ID", "plan_id" : "31609PA0150004", "marketing_name" : "Personal Choice PPO Gold Classic\n $1,000 $15/$30/80%", "summary_url" : "https://www.ibx.com/ffm/shop2016",
Nice little \n in the marketing name
Here is the corrected version plans.zip
@dportnoy still working on the formulary files to add in issuer and marketplace. I need to kick off a new join and then download, roundtrip is a few hours.
Will late in the day work tomorrow or even Thursday? This daytime job is getting in the way.
@BAP-Jeff, yes, funny about "\n". Sneaky. Tomorrow is OK. We'll let that day job thing slide this time. :-P
@marks, @ftrotter, want to tackle PROVIDERS? @marks, was my feedback on https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211932545 sufficient? @ftrotter, if I recall correctly all things provider networks are your specialty!
@dportnoy, hey I have added in marketplace and issuer name to the file, but we lost around 500K records (some issuer wasn't around in December I guess). Whatcha think? Use this new file or leave the old one?
Or we could just leave those columns blank...
@BAP-Jeff, do an outer join, so that you don't lose any records. For missing values, perhaps leave blank. It can be figured out and backfilled later.
And in case this is helpful...
- FFM vs SPMs for plan year 2016:
- 19 FFM states: Alabama, Alaska, Arizona, Florida, Georgia, Indiana, Louisiana, Mississippi (individual market), Missouri, New Jersey, North Carolina, North Dakota, Oklahoma, Pennsylvania, South Carolina, Tennessee, Texas, Wisconsin, Wyoming
- 15 SPM states: Arkansas (individual market), Delaware, Iowa, Illinois, Kansas, Maine, Michigan, Montana, Nebraska, New Hampshire, Ohio, South Dakota, Utah (individual market), Virginia, West Virginia
Here ar the updated formulary files....first one is the sample
https://drive.google.com/file/d/0B9yZheZrBn54Y3ZEME5LcnA0S0E/view?usp=sharing
https://drive.google.com/file/d/0B9yZheZrBn54M2RuTW1xU2RWOXc/view?usp=sharing
@BAP-Jeff, great! Remind me the differences from the prior ones please.
Added in Issuer name and marketplace
On Apr 21, 2016, at 9:12 AM, David X Portnoy notifications@github.com wrote:
@BAP-Jeff, great! Remind me the differences from the prior ones please.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
I'll take the new Formulary file and upload to Socrata in case anyone wants to work off of that as opposed to having to dealing with the 6+GB file :)
Great, @marks. I appreciate it. I'm working on a page with all the links.
That said, we still need PROVIDERS files.
I could maybe get to it tomorrow morning if you are desperate.
On Thu, Apr 21, 2016 at 10:14 AM, David X Portnoy notifications@github.com wrote:
Great, @marks https://github.com/marks. I appreciate it. I'm working on a page with all the links.
That said, we still need PROVIDERS files.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-212937652
That would be beyond fantastic!
On Apr 21, 2016, at 9:17 AM, BAP-Jeff notifications@github.com wrote:
I could maybe get to it tomorrow morning if you are desperate.
On Thu, Apr 21, 2016 at 10:14 AM, David X Portnoy notifications@github.com wrote:
Great, @marks https://github.com/marks. I appreciate it. I'm working on a page with all the links.
That said, we still need PROVIDERS files.
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-212937652
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
I just kicked off a process to download all the provider.json files. I have gotten through the first 10, initial impression is that these files are MASSIVE.
This could take a while to process.
@BAP-Jeff - that's why I initially focused on formulary ;) - Anyway I can help by running some scripts locally for you and/or on some cloud servers?
@marks, ha! Thanks for the offer. I can grab the files, but based on some back of the envelope calculations this is going to be multi-terabytes of source data.
I'll check in later.
@BAP-Jeff, @marks, I was afraid of that. A few options:
issuer_id
to providers_url
. Then the "providers" file doesn't need to repeat entries that come from the same providers_url
.None of these are ideal. Perhaps to get something going, we can focus on loading a subset of the data. So pick either one interesting large state (like TX or FL) or small state (like AK or SD) to load. This would at least allow for use with consumer apps and certain types of analytics.
@BAP-Jeff, @dportnoy - the latest (https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-212898336) formulary files are API enabled at the following link. Let me know how else I can help, of course.
Thanks! That leaves just the providers files.
On Apr 22, 2016, at 7:15 AM, Mark Silverberg notifications@github.com wrote:
@BAP-Jeff, @dportnoy - the latest (#56 (comment)) formulary files are API enabled at the following link. Let me know how else I can help, of course.
https://healthdata.demo.socrata.com/view/xc22-8t66
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
I downloaded all the provider json files last night. Looks like maybe the "big" ones were at the top of the list and they got better after that. As we discussed/suggested, I am breaking them into a number of tables - indiv_languages, fac_addresses, etc... Still going to be big files, but seems like things are running....
FYI, we have some healthy sized files. I just exhausted the memory on my box. It will take me a few to spin up a big honker on AWS to run these big guys...
Maybe I should just give up on these for now and deal with them later. @dportnoy what is the timing on all this? Today right?
-rw-rw-r-- 1 ubuntu ubuntu 2267461069 Apr 21 21:11 PROVJSON3_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 2170298888 Apr 21 22:29 PROVJSON2903_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1544176291 Apr 21 22:18 PROVJSON2891_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1456651939 Apr 21 23:08 PROVJSON2915_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1081970470 Apr 21 22:44 PROVJSON2910_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1073854978 Apr 22 01:46 PROVJSON3425_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1029692658 Apr 22 00:21 PROVJSON2987_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1014403038 Apr 21 23:15 PROVJSON2916_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 976208707 Apr 21 22:36 PROVJSON2909_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 965357940 Apr 21 21:58 PROVJSON2833_201604212013.json
@BAP-Jeff, for now could you create a sample file and pick one full state to crawl. (See bottom of my note above.) Since the size would be manageable, you could create the simplest layout possible.
On Apr 22, 2016, at 8:17 AM, BAP-Jeff notifications@github.com wrote:
FYI, we have some healthy sized files. I just exhausted the memory on my box. It will take me a few to spin up a big honker on AWS to run these big guys...
Maybe I should just give up on these for now and deal with them later. @dportnoy what is the timing on all this? Today right?
-rw-rw-r-- 1 ubuntu ubuntu 2267461069 Apr 21 21:11 PROVJSON3_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 2170298888 Apr 21 22:29 PROVJSON2903_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1544176291 Apr 21 22:18 PROVJSON2891_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1456651939 Apr 21 23:08 PROVJSON2915_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1081970470 Apr 21 22:44 PROVJSON2910_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1073854978 Apr 22 01:46 PROVJSON3425_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1029692658 Apr 22 00:21 PROVJSON2987_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 1014403038 Apr 21 23:15 PROVJSON2916_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 976208707 Apr 21 22:36 PROVJSON2909_201604212013.json -rw-rw-r-- 1 ubuntu ubuntu 965357940 Apr 21 21:58 PROVJSON2833_201604212013.json
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
@dportnoy actually that is tougher than it sounds. To get a full state would imply that we would need to load all the files then query the state we want out of it. I can work towards that but it will take me some time to get a machine that can parse the 2GB files. What is the timing?
Here is what the header files will look like for eight files (anything obvious I am missing?):
plan_id_type|plan_id|network_tier|npi
npi|prefix|first|middle|last|suffix|accepting|gender|last_updated_on
npi|language
npi|specialty
npi|address|address_2|city|state|zip|phone (I could combine with facility address file...)
npi|facility_name|last_updated_on
npi|facility_type
npi|address|address_2|city|state|zip|phone
@BAP-Jeff,
One-state subset: I could be wrong, but I thought you start with the state
associated with each issuer in the Machine Readable PUF 02-22-2016.txt. Not sure if each shows nationwide networks, but I'm sure some have narrow networks that wouldn't be as big. If that logic is incorrect or doesn't work well from a workload perspective, please do what makes most sense.
Sample file: Besides the one-state option, it would be helpful to have a small sample file to work with -- regardless what data is in it.
Timing: Ideally by end of day. But I'll start writing up links to the files we already have in parallel. You've done so much already. I really appreciate it!
Looking at fields next...
@dportnoy, I think what you are saying is more or less right. What I am seeing is that an Issuer can be operating across multiple states, so the url listed for say Aetna/Texas is the same url as Aetna/Florida. I could (as I think you suggest), go after the urls listed for a state and if there is spill over to other states so be it. But looks like national plans are just that....they show up everywhere which makes the task tough. I am making progress on loading everything though.
Do you have a state that you would like me to grab?
On Fri, Apr 22, 2016 at 10:20 AM, David X Portnoy notifications@github.com wrote:
@BAP-Jeff https://github.com/BAP-Jeff,
One-state subset: I could be wrong, but I thought you start with the state associated with each issuer in the Machine Readable PUF 02-22-2016.txt https://github.com/demand-driven-open-data/ddod-intake/files/231940/Machine.Readable.PUF.02-22-2016.txt. Not sure if each shows nationwide networks, but I'm sure some have narrow networks that wouldn't be as big. If that logic is incorrect or doesn't work well from a workload perspective, please do what makes most sense.
Sample file: Besides the one-state option, it would be helpful to have a small sample file to work with -- regardless what data is in it.
Timing: Ideally by end of day. But I'll start writing up links to the files we already have in parallel. You've done so much already. I really appreciate it!
Looking at fields next...
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-213446518
@BAP-Jeff, on fields...
I think we need to add plan_id
and network_tier
to INDIVIDUAL and FACILITY files. Otherwise we'll end up with a lot of "duplicate" npi entries, each with slightly different information, including whether they're accepting
new patients. We won't know which entry belongs to which plan.
Optional: Consider adding issuer_id
(from PUF), as that would be useful for some of analytics.
https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-213449685 Which state? I'd love to get CA since BayesHack is there, but alas it's not included. If you want to be on the conservative side, you can pick a state with fewest entries in PUF, like AK or SD. When I did my last analysis a couple months ago, both these states had only one providers.json url.
Got it. It will be interesting to see if the plans put leverage their own internal address information or just use the NPPES file...
I'll take a look at AK, I might also look at ME.
I am focussing on pulling files for ME. Though it is only a few JSON files, the created tables are enormous. Just for ME we are looking at probably a bit over 5GBs. I am not sure what to do about that. I guess I will just zip them up and post them. Wow.
@BAP-Jeff, since the size is ending up to be so big anyway, what if we post a subset first. Perhaps pick some arbitrary way to grab a subset that people can easily work with?
@dportnoy, okay I got Maine done. It is posted to here:
https://drive.google.com/file/d/0B9yZheZrBn54UFFBOGlLWmZVejQ/view?usp=sharing
Compressed down to about 130mb. Very little QA on this guy. If we want to pull out something smaller let me know. Not entirely obvious how to filter it...maybe by a couple PlanIDs.....
@BAP-Jeff nice! Let's go with it.
Working on BayesImpact related posts now. (@BAP-Jeff, @marks Could you help me summarize the latest and best links we have for each category.)
Will finish tomorrow morning!
@dportnoy all I've got for ya is https://healthdata.demo.socrata.com/view/xc22-8t66 for the latest version of the Formulary data @BAP-Jeff scraped. Happy to upload anything else too though
@marks, It would be great if you could load the updated data. I'll proceed to publish individual links in the mean time, as well as data dictionaries.
@dportnoy somehow missed Jeff's comment about ME. I'll see what I can do but please confirm that the Formulary file is as you'd expect it to be. Can definitely update title/description with whatever you'd like (perhaps a link back to the right place to see other resources)
@BAP-Jeff / @dportnoy - as you all know, providers are split into 6 files this time. Think it's worth creating one or two combined files for easier analysis? May need to leave something off like fac type or language or concatenate arrays into a string. Just a suggestion for making use easier like it is for formulary. Regardless, working on uploading to enable viz and APIs for these 6 files
Good ideas. I am out of pocket for most of the weekend. Maybe we see if anyone hits the APIs over the next couple weeks and then look to improve things.
Will be interested to hear from @dportnoy what kind of interest/if any this generates. Clearly there are not many people using these files.
On Sat, Apr 23, 2016 at 10:59 AM, Mark Silverberg notifications@github.com wrote:
@BAP-Jeff https://github.com/BAP-Jeff / @dportnoy https://github.com/dportnoy - as you all know, providers are split into 6 files this time. Think it's worth creating one or two combined files for easier analysis? May need to leave something off like fac type or language or concatenate arrays into a string. Just a suggestion for making use easier like it is for formulary. Regardless, working on uploading to enable viz and APIs for these 6 files
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-213753353
@dportnoy the following files are ready
All US Forumulary file: https://healthdata.demo.socrata.com/view/xc22-8t66
Maine Provider Facility Type file: https://healthdata.demo.socrata.com/view/3juv-cnb4 Maine Provider Individual Specialty file: https://healthdata.demo.socrata.com/view/gz7p-vgqg Maine Provider Individual Language file: https://healthdata.demo.socrata.com/view/fiik-zy3e
Update... @BAP-Jeff, @marks, thank you again! You guys were a huge help! Couldn't have done it without you. We still need to write up the activities at BayesHack, but there were 8 HHS teams at BayesHack, 4 of them specifically dealing with helping consumers find the right healthcare.
Anything useful/shareable come out of it? I would love to see.
On Tue, Apr 26, 2016 at 11:25 AM, David X Portnoy notifications@github.com wrote:
Update... @BAP-Jeff https://github.com/BAP-Jeff, @marks https://github.com/marks, thank you again! You guys were a huge help! Couldn't have done it without you. We still need to write up the activities at BayesHack, but there were 8 HHS teams at BayesHack, 4 of them specifically dealing with helping consumers find the right healthcare.
[image: bayeshack - 8 hacker teams for hhs] https://cloud.githubusercontent.com/assets/2925801/14823613/c549e2c8-0b98-11e6-905d-0c3fa9929634.png
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-214782212
Ditto.
Jordan Rau | Senior Correspondent | Kaiser Health News
From: cornstein [mailto:notifications@github.com] Sent: Tuesday, April 26, 2016 1:37 PM To: demand-driven-open-data/ddod-intake ddod-intake@noreply.github.com Cc: Jordan Rau JordanR@kff.org; Mention mention@noreply.github.com Subject: Re: [demand-driven-open-data/ddod-intake] Aggregate machine-readable provider network directories and drug formularies into tabular format (#56)
Anything useful/shareable come out of it? I would love to see.
On Tue, Apr 26, 2016 at 11:25 AM, David X Portnoy notifications@github.com wrote:
Update... @BAP-Jeff https://github.com/BAP-Jeff, @marks https://github.com/marks, thank you again! You guys were a huge help! Couldn't have done it without you. We still need to write up the activities at BayesHack, but there were 8 HHS teams at BayesHack, 4 of them specifically dealing with helping consumers find the right healthcare.
[image: bayeshack - 8 hacker teams for hhs] https://cloud.githubusercontent.com/assets/2925801/14823613/c549e2c8-0b98-11e6-905d-0c3fa9929634.png
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-214782212
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHubhttps://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-214822486
Putting out a call to those interested in making an impact by contributing to public data projects... Looking for somebody to create a new public dataset (and accompanying source code).
Background
In November 2015, the Centers for Medicare & Medicaid Services (CMS) enacted a new regulatory requirement for health insurers who list plans on insurance marketplaces. They must now publish a machine-readable version of their provider network directory and drug formulary, publish it to a specified JSON standard, and update it at least monthly. This data has just recently become accessible to the public. Some of its uses can be found in the Bayes Impact hackathon "prompts" or in at least 7 DDOD use cases.
Challenge
While these newly available datasets can significantly benefit consumer health applications and be used in a range of healthcare analytics, the current format doesn't lend itself to doing so.
Request
Write code that does the following:
Run the code and let us know where to find the resulting files. We should be able to find a good home for them, so that they enjoy widespread use.
If you can do this, you’ll be an official Open Data Hero! (Spandex optional.)