demand-driven-open-data / ddod-intake

"DDOD Intake" tracks DDOD Use Cases using GitHub issues. View the main DDOD site here
http://ddod.us
28 stars 11 forks source link

Aggregate machine-readable provider network directories and drug formularies into tabular format #56

Open dportnoy opened 8 years ago

dportnoy commented 8 years ago

Putting out a call to those interested in making an impact by contributing to public data projects... Looking for somebody to create a new public dataset (and accompanying source code).

Background

In November 2015, the Centers for Medicare & Medicaid Services (CMS) enacted a new regulatory requirement for health insurers who list plans on insurance marketplaces. They must now publish a machine-readable version of their provider network directory and drug formulary, publish it to a specified JSON standard, and update it at least monthly. This data has just recently become accessible to the public. Some of its uses can be found in the Bayes Impact hackathon "prompts" or in at least 7 DDOD use cases.

Challenge

While these newly available datasets can significantly benefit consumer health applications and be used in a range of healthcare analytics, the current format doesn't lend itself to doing so.

Write code that does the following:

  1. Crawls the URLs starting with the "Machine-readable URL PUF" seed file found on this page: https://www.cms.gov/CCIIO/Resources/Data-Resources/marketplace-puf.html.
    • (As of 3/1/2016, there are 636 plan URLs, 23,711 provider URLs and 1,662 formulary URLs.)
  2. Converts the data from the JSON schema (https://github.com/CMSgov/QHP-provider-formulary-APIs) into a tabular format with the same fields.
    • (There may be some challenges converting multiple independent array fields from JSON to a tabular row.)
  3. Aggregates the results into CSV or text delimited text files, such that we have files for: Plans, Providers, and Formularies.

Run the code and let us know where to find the resulting files. We should be able to find a good home for them, so that they enjoy widespread use.

If you can do this, you’ll be an official Open Data Hero! (Spandex optional.)

dportnoy commented 8 years ago

@marks, @ftrotter, @cornstein, @BAP-Jeff, @loranstefani, @Jordanrau, @HeatherBrotsos, @lloydbrodsky, Since you're all past contributors, I wanted to see if any of you would take the challenge. Feel free to pass it on to your colleagues as well.

dportnoy commented 8 years ago

In case it's helpful, I started working on the code, but just haven't had enough time to get to it lately. Feel free to use. You can find it here: https://github.com/dportnoy/health-insurance-marketplace-analytics

marks commented 8 years ago

@dportnoy - looks like you got pretty far.. would you mind documenting what it does so far and what areas need attention? Would definitely help so others dont accidentally reinvent the wheel

dportnoy commented 8 years ago

@marks You give me entirely too much credit. But point taken. I'll write up a summary and post it.

dportnoy commented 8 years ago

Suggestions for JSON to tabular (CSV/TSV) column mapping

Translation from JSON schema into tabular should be straight forward...

marks commented 8 years ago

I put something together really quickly that takes insurance provider's index.json URLs and flattens the data from the formulary_urls array.

Sample dataset: https://healthdata.demo.socrata.com/view/979j-m4qb Code: https://github.com/marks/health-insurance-marketplace-analytics/commit/3507b4738fbb1ba75126e474bd6c24f0baf9d133

cornstein commented 8 years ago

Really interesting output, but I wonder if there's a way to make this a bit more useful. Perhaps we can have a brief chat?

On Sat, Apr 16, 2016 at 11:44 AM, Mark Silverberg notifications@github.com wrote:

I put something together really quickly that takes insurance provider's index.json URLs and flattens the data from the formulary_urls array.

Sample dataset: https://healthdata.demo.socrata.com/view/979j-m4qb Code: marks/health-insurance-marketplace-analytics@3507b47 https://github.com/marks/health-insurance-marketplace-analytics/commit/3507b4738fbb1ba75126e474bd6c24f0baf9d133

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-210842613

marks commented 8 years ago

@cornstein - definitely... That's just a quick visualization of the data. You can export it (all or a slice) or use the API as well, of course.

Kind of hectic day but happy to jump on a call tomorrow or continue to chat here if you're interested in a specific view

Mark Silverberg (m) 512 826 7004 http://twitter.com/skram

On Apr 16, 2016, at 11:56 AM, cornstein notifications@github.com wrote:

Really interesting output, but I wonder if there's a way to make this a bit more useful. Perhaps we can have a brief chat?

On Sat, Apr 16, 2016 at 11:44 AM, Mark Silverberg notifications@github.com wrote:

I put something together really quickly that takes insurance provider's index.json URLs and flattens the data from the formulary_urls array.

Sample dataset: https://healthdata.demo.socrata.com/view/979j-m4qb Code: marks/health-insurance-marketplace-analytics@3507b47 https://github.com/marks/health-insurance-marketplace-analytics/commit/3507b4738fbb1ba75126e474bd6c24f0baf9d133

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-210842613

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

dportnoy commented 8 years ago

Here are some options for the end product, along with their uses. They are listed in increasing order of difficulty:

  1. AGGREGATE: Aggregate the crawled URLs into a small number of files, keeping the original QHP schema format in JSON: plans, providers, drugs.
    • Eliminates the need to crawl thousands of URLs is no longer needed.
    • Expect to run into file size challenges, since there are thousands of URLs and some files may be hundreds of megabytes.
    • This format may be adequate for loading into a NoSQL/document database for use with non-analytical consumer web apps.
  2. FLATTEN: Make the format tabular, in the easiest way possible.
    • Method: The files produced can line up exactly with the QHP schema entities pulled from URL links: plans, providers, drugs. For arrays, you could put multiple values in a single field.
    • The goal here is to create files that can be hosted ASAP and used for rudimentary analytics.
  3. NORMALIZE: Normalize the work product from item # 2 into relational tables, tied with foreign keys.
    • This format is useful for loading into a relational database for use in consumer apps and web services, especially if they require calculations and transformations. Convenient for codathons and challenges.
  4. JOIN & PIVOT: Make a tabular format that’s easiest for analytics by creating one row per entity with all possible attributes.
    • For example, in addition to the fields in the drugs.json schema and its plans subtype, include fields from "Machine-readable URL PUF" XLSX seed file: Marketplace Category, State, Issuer ID, Issuer Name. You can find other joins between the entities of the QHP schema.
    • Expect to run into challenges for entities with multiple arrays that are free-form or have a long tail of values.
marks commented 8 years ago

I've made a bit more progress. The latest code is available at https://github.com/marks/health-insurance-marketplace-analytics/blob/master/flattener/flatten_from_index.py

I have code that starts* to flatten all three file types: drugs.json, plans.json, and providers.json. I have only run providers.json and drugs.json on a few index.json's as it can take a while to process.

The flattened data can be explored/downloaded/called-with-an-API from the following links:

File Type Flattened Data Explorer
(you can export from here too)
Data Grid
(you can do roll ups and filtering here)
API Docs
plans.json https://healthdata.demo.socrata.com/view/yjgn-upfm https://healthdata.demo.socrata.com/d/x6bt-9iym https://dev.socrata.com/foundry/healthdata.demo.socrata.com/55ch-w329
providers.json https://healthdata.demo.socrata.com/view/pqgg-pgsy https://healthdata.demo.socrata.com/d/axbq-xnwy https://dev.socrata.com/foundry/healthdata.demo.socrata.com/5ht2-73gj
drugs.json https://healthdata.demo.socrata.com/view/979j-m4qb https://healthdata.demo.socrata.com/d/jaa8-k3k2 https://dev.socrata.com/foundry/healthdata.demo.socrata.com/6ngr-bz3d

* Not all data is flattened at this time to decrease the number of rows. For example, languages in the provider file are serialized as a JSON array as are addresses and specialties

dportnoy commented 8 years ago

@BAP-Jeff, hi again. Welcome to use case #56! To answer your question from issue #52... Having a ready to go dataset would be ideal for the Bayes Hack codathon for next weekend. It's identified as the central dataset for building consumer healthcare apps. You'll see a comment above that identifies 4 possible file types: https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-210875780. It's good to have a few people working on this, because some of the aspects (such as file sizes and cleanup for analytics) are challenging. Having the code would be a useful resource for the community to refresh the data in the future.

ghost commented 8 years ago

Hello All,

As David knows, I have been playing with the drugs.json set for a while as I build some commercial applications for the formulary space. There are a ton of challenges in using the data as supplied but overall it is a good first year effort by the agency.

If I understand the challenges David laid out, I have been pursuing Option

4 for my app. I am using a couple simple Python scripts to (1) read a

list of URLs from the Machine Readable PUF, then retrieve the drugs.json files for each (2) then I run a second script to rip apart the json documents into a flat pipe delimited file.

I am happy to share the code and/or the end result (warning around 35 to 40 million records). What do we want for the hack-a-thon? Have to admit I have not participated in one before, so I am not sure if people are only interested in the end result (upload the dataset to say Socrata or kaggle) or do they want to understand the process, so they can roll their own?

Let me know which way the group wants to go.

Jeff

On Sat, Apr 16, 2016 at 11:46 PM, David X Portnoy notifications@github.com wrote:

@BAP-Jeff https://github.com/BAP-Jeff, hi again. Welcome to use case #56 https://github.com/demand-driven-open-data/ddod-intake/issues/56! To answer your question from issue #52 https://github.com/demand-driven-open-data/ddod-intake/issues/52... Having a ready to go dataset would be ideal for the Bayes Hack codathon for next weekend. It's identified as the central dataset for building consumer healthcare apps. You'll see a comment above that identifies 4 possible file types: #56 (comment) https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-210875780. It's good to have a few people working on this, because some of the aspects (such as file sizes and cleanup for analytics) are challenging. Having the code would be a useful resource for the community to refresh the data in the future.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-210948520

marks commented 8 years ago

@BAP-Jeff - awesome. I think both the data and the provenance (how you got it, including any scripts/documentation) would be much appreciated by participants as well as the community in general. Would be happy to look into hosting it onto Socrata if you can provide the raw data

Mark

ghost commented 8 years ago

Okay thanks. I will put together some things tonight. Much of my time in working with this dataset is dealing with the data discrepancies (i.e. multiple statuses on a drug for a plan, use of NDCs vs RxCUIs, missing restrictions, bogus URLs), but showing off the millions of records that are produced will be cool. Plus hopefully making it more accessible will get more eyes on it...and we can get those things corrected.

Here is the "process" I would propose that we show off (for the drugs.json branch):

David, do you need a "deck" or some sort of visual to present? Help me out if I am going astray...

Jeff

On Sun, Apr 17, 2016 at 10:08 AM, Mark Silverberg notifications@github.com wrote:

@BAP-Jeff https://github.com/BAP-Jeff - awesome. I think both the data and the provenance (how you got it, including any scripts/documentation) would be much appreciated by participants as well as the community in general. Would be happy to look into hosting it onto Socrata if you can provide the raw data

Mark

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211026131

marks commented 8 years ago

I dont want to make more work for you, @BAP-Jeff, but I'd be very interested in what your CSVs are looking like. I am running some code on the index.jsons I found in a previously shared xlsx to flatten all three file types (but they may not have all important info... I am not a SME in this area

ghost commented 8 years ago

I actually use pipe delimiting but can easily change it out. I am attaching an example source json file and what I am outputting (tried to find a small one). It is very close to what you had shown off before Mark. I can easily add some more columns to this from the Plan Attributes PUF file to be more complete (IssuerID, Region, etc), but sounded like you wanted to see what the current state was. I also included the python script I used to parse the file.

The format of the output is:

Archive.zip

HIXFormExtract.py.zip

marks commented 8 years ago

@BAP-Jeff - Yup! Looks quite similar. I would think it would be very helpful to folks to include info from the Plan Attributed PUF. Will standby for files so that we can get them on Socrata* (and Kaggle and other platforms) for dissemination/exploration/API acces

* Full disclosure: I work for Socrata and we're more than happy to help folks download the data in a number of formats. Can accept pipe delimited, no big deal.

ghost commented 8 years ago

Okay I will be gone most of the day but will get on this later today.

On Apr 17, 2016, at 11:04 AM, Mark Silverberg notifications@github.com wrote:

@BAP-Jeff - Yup! Looks quite similar. I would think it would be very helpful to folks to include info from the Plan Attributed PUF. Will standby for files so that we can get them on Socrata* for dissemination/exploration/API acces

  • Full disclosure: I work for Socrata and we're more than happy to help folks download the data in a number of formats. Can accept pipe delimited, no big deal.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

dportnoy commented 8 years ago

@BAP-Jeff, great to have you contribute! Let's see if I could answer most of your questions...

ghost commented 8 years ago

David,

I think I can take on all the above but may need a little help with the Deck. Do you have a template you want to use - Powerpoint, Google, ??

I can produce the files tonight/tomorrow and upload them probably tomorrow.

@dportnoy give me the details on the kaggle stuff @marks let me know where to drop the files for Socrata

David, did you want to reserve any slides for ..."Issues we found in the data" or does that diminish the good vibes?

Jeff

marks commented 8 years ago

Jeff- if I can just get delimited files, I can take care of loading it for ya. Sounds like kaggle is top priority for this event though ;)

Mark Silverberg (m) 512 826 7004 http://twitter.com/skram

On Apr 17, 2016, at 7:18 PM, BAP-Jeff notifications@github.com wrote:

David,

I think I can take on all the above but may need a little help with the Deck. Do you have a template you want to use - Powerpoint, Google, ??

I can produce the files tonight/tomorrow and upload them probably tomorrow.

@dportnoy give me the details on the kaggle stuff @marks let me know where to drop the files for Socrata

David, did you want to reserve any slides for ..."Issues we found in the data" or does that diminish the good vibes?

Jeff

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

ghost commented 8 years ago

Okay I will loop back around to you as time is short.

On Sun, Apr 17, 2016 at 7:28 PM, Mark Silverberg notifications@github.com wrote:

Jeff- if I can just get delimited files, I can take care of loading it for ya. Sounds like kaggle is top priority for this event though ;)

Mark Silverberg (m) 512 826 7004 http://twitter.com/skram

On Apr 17, 2016, at 7:18 PM, BAP-Jeff notifications@github.com wrote:

David,

I think I can take on all the above but may need a little help with the Deck. Do you have a template you want to use - Powerpoint, Google, ??

I can produce the files tonight/tomorrow and upload them probably tomorrow.

@dportnoy give me the details on the kaggle stuff @marks let me know where to drop the files for Socrata

David, did you want to reserve any slides for ..."Issues we found in the data" or does that diminish the good vibes?

Jeff

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211128096

dportnoy commented 8 years ago

@BAP-Jeff Google slides. No specific template. Just the white default.

For Kaggle, upload to it will be done manually by their staff (since they're still working on making that functionality self-serve). So we need a staging location before the data gets copied there.

ghost commented 8 years ago

Okay I have the file ready to go. It is about 5GBs uncompressed (31 million records). The file has the following table def:

create table Hackathon (StateCode varchar(2), IssuerID char(5), StandardComponentID varchar(14), MarketCoverage varchar(18), PlanMarketingName varchar(142), HIOSProductID varchar(10), ServiceAreaID varchar(6), FormularyID varchar(6), PlanType varchar(9), MetalLevel varchar(12), DrugStatus varchar(100), RxCUI varchar(15), DrugName varchar(2000), PAStatus char(1), STStatus char(1), QLStatus char(1))

I pulled from Plan Attributes what I thought was the key information. Note: PlanID in the Formulary Files is not the same as PlanID in Plan Attributes. The analogous column is StandardComponentID, furthermore I ignored the CSR variations in Plan Attributes and joined the distinct records from Plan Attributes to the Formulary Files. If this didn't make sense...well I guess trust me...

I have added a header record to the file and am now waiting for the zipping to finish up. I assume it will get down to the 2GB range. I can throw it up into a S3 bucket and share it I guess, or GDrive. Let me know. Could be a few days on the slides....

Jeff

dportnoy commented 8 years ago

@BAP-Jeff, definitely throw it up on either S3 or gDrive, whichever works best! Ideally, you could also provide a smaller file that has a more manageable sample subset of this data, so that it can be used to test code.

dportnoy commented 8 years ago

@ftrotter, wanted to ping you and see if you're still working on any aspect of this.

ghost commented 8 years ago

Actually it zipped down to 250 megs so not such a monster. I also did a sample file for all of Humana in Georgia.

marks commented 8 years ago

@BAP-Jeff - I think you missed a pipe in the header row

ghost commented 8 years ago

Thank goodness someone is QAing this! Thanks @marks. Be right back.

On Mon, Apr 18, 2016 at 5:57 PM, Mark Silverberg notifications@github.com wrote:

@BAP-Jeff https://github.com/BAP-Jeff - I think you missed a pipe in the header row

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211598616

ghost commented 8 years ago

DRUGS file sample: https://drive.google.com/file/d/0B9yZheZrBn54RHk1SmszenNnYTg/ DRUGS file complete: https://drive.google.com/file/d/0B9yZheZrBn54MGplUW5BeVRhMDQ/

dportnoy commented 8 years ago

@marks apologies for the delayed reply on your questions in https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-210890927.

Unfortunately, I can't get your links to work. I'm getting a message "Sign in with your Socrata ID". But when I register a new ID, I get "The current user doesn't have access to this view". For the API links, I'm getting a "Private Dataset" message.

dportnoy commented 8 years ago

@BAP-Jeff, it rocks!

DRUGS file complete Hackathon.zip unzips (using tar xzvf Hackathon.zip):

DRUGS file sample HackathonSample.zip unzips (using tar xzvf HackathonSample.zip):

I verified the columns for the sample file align and match the header you provided. So we might be good to go with the drug formulary files!

marks commented 8 years ago

@dportnoy mind testing those links again?

dportnoy commented 8 years ago

@BAP-Jeff, I believe you said you were also looking at PLANS files, right?

Some questions and possible improvements for DRUGS:

So exciting!

dportnoy commented 8 years ago

@marks Progress! Data grid and APIs seem to be working, but data explorer redirects to https://healthdata.demo.socrata.com/. I can dig into the detail by tomorrow morning.

marks commented 8 years ago

@dportnoy think I fixed my typo. Gonna load @BAP-Jeff's file so that I can play with some aggregations etc via API as opposed to having to work with the full file which is a bit large ;)

ghost commented 8 years ago

@dportnoy, I have not been working with the PLANS files yet.

Regarding your questions:

On Mon, Apr 18, 2016 at 10:01 PM, David X Portnoy notifications@github.com wrote:

@BAP-Jeff https://github.com/BAP-Jeff, I believe you said you were also looking at PLANS files, right?

Some questions and possible improvements for DRUGS:

-

The ServiceAreaID and FormularyID fields are equal. Where are they coming from?

Would it be possible to use snake case and naming that's consistent with the schema spec https://github.com/CMSgov/QHP-provider-formulary-APIs, rather than camel case and descriptive naming. For example...

  • plan_id, instead of StandardComponentID

    • marketing_name, instead of PlanMarketingName
    • drug_tier, instead of DrugStatus
    • rxnorm_id, instead of RxCUI
    • prior_authorization, instead of PAStatus
    • step_therapy, instead of STStatus
    • quantity_limit, instead of QLStatus

    - etc.

    Would you want to add a couple more fields from the PUF file:

  • marketplace_category (FFM/SPM)
    • issuer_name (in addition to the PlanMarketingName name you already have)

So exciting!

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211685754

ghost commented 8 years ago

Here is what the modified header would look like:

state_code|issuer_id|plan_id|market_coverage|plan_marketing_name|hios_product_id|service_area_id|formulary_id|plan_type|metal_level|drug_tier|rxnorm_id|drug_name|prior_authorization|step_therapy|quantity_limit

Sorry those pipes are hard to see.

ghost commented 8 years ago

@dportnoy, I am taking a quick look at plans.json. I just downloaded them all and now playing with the extraction. Probably not a big challenge.

marks commented 8 years ago

@BAP-Jeff's drugs.json extract is on Socrata exploration experience which also links to export and API at https://healthdata.demo.socrata.com/view/7unb-qpyc if anyone wants to use it to take a look. If there is a data dictionary explaining each field, I can put that in to provide a little more context to users. Hope it's helpful!

Mark

dportnoy commented 8 years ago

@marks, @BAP-Jeff, way to team up and get the DRUGS file built and hosted!

@BAP-Jeff, thanks for the updated header record! To answer your question, marketplace_category and issuer_name can be found in the "Machine-readable URL PUF" seed file.

ghost commented 8 years ago

@dportnoy, I see the issuer_name but I don't see the marketplace_category.

On Tue, Apr 19, 2016 at 9:36 AM, David X Portnoy notifications@github.com wrote:

@marks https://github.com/marks, @BAP-Jeff https://github.com/BAP-Jeff, way to team up and get the DRUGS file built and hosted!

@BAP-Jeff https://github.com/BAP-Jeff, thanks for the updated header record! To answer your question, marketplace_category and issuer_name can be found in the "Machine-readable URL PUF" http://download.cms.gov/marketplace-puf/2016/machine-readable-url-puf.zip seed file.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-211925504

dportnoy commented 8 years ago

@marks, I was finally able to get to all the links in https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-210890927. Great start on those files! I see that you combined arrays and hierarchies into single fields for simplicity. I think next step should be to generate the complete files, so that at least we have something useful as a starting point for the weekend. We could always improve later as time permits. As with @BAP-Jeff's DRUGS file, we can use the datasets you already built as the sample subsets for app development and testing.

Is there a naming convention for fields that start with underscore or contain period?

Not crucial but... Consider pulling in a few of the attributes from the "Machine-readable URL PUF" seed file or other cross-reference between PLANS-PROVIDERS-DRUGS that might be useful for analytics.

ghost commented 8 years ago

@dportnoy, I have the plans.json material parsed. What is less obvious is how we want to conflate these guys into a single record...actually I don't think it makes sense to do so. Have you looked at it at all? I could create plan, network, formulary and cost_sharing files if you like, joined by plan_id or something. Looks like the benefit piece is a work in progress.

Let me know. Unlike the formulary these files are extremely easy to work with (read: small), so I can iterate much more quickly....

dportnoy commented 8 years ago

@BAP-Jeff, on your questions:

dportnoy commented 8 years ago

@BAP-Jeff, @marks, @cornstein, if time allows it might be useful to figure out how to represent the plans that have missing data. In other words, those who either:

  1. Didn't provide an index.json URL at all
  2. Provided a URL, but it didn't work
  3. Had a working URL, but the content didn't conform to the schema

Seems that having missing values would be useful for both analytics and consumer applications. The question is where to best capture this info. Thoughts?

ghost commented 8 years ago

@dportnoy, I am out for the rest of the week EOB Wednesday. I am going to prioritize doing another cut of the formulary data with the additional fields. Then I will create the plans data. Unlikely I am going to get to the slides at this moment. Sorry.

As for your piece on who's missing, #1 is easy, #2 takes a bit of manual entry, and #3 (at least for me ends up in bucket #2).

Should I just add a column to say the machine readable puf with "GOOD", "BAD" for the URL provided?

ghost commented 8 years ago

All, here is my quick attempt to stitch something together for plans.json.

Four files joined on plan_id and/or plan_id-drug_tier. Take a look, just one plan for now.

plans.zip

dportnoy commented 8 years ago

@BAP-Jeff,

Should I just add a column to say the machine readable puf with "GOOD", "BAD" for the URL provided?

Sure that could work. (If it's easier, we can have a separate addendum file with the same header, but containing just the plans that couldn't be processed. Analysts could simply append this file to the main one, if they want to include the missing data points.)

dportnoy commented 8 years ago

@BAP-Jeff, on https://github.com/demand-driven-open-data/ddod-intake/issues/56#issuecomment-212055405

==> PlanOut.pip <== plan_id|marketing_name|summary_url|marketing_url|formulary_url|plan_contact|last_updated_on

==> PlanNetworkOut.pip <== plan_id|network

==> formularyOut.pip <== plan_id|drug_tier|mail_order

==> costOut.pip <== plan_id|drug_tier|pharmacy_type|copay_amount|copay_opt|coinsurance_rate|coninsurance_opt

Looks good! Let's load. A couple observations for future improvement: