Identifying dual eligibles in CMS DE-SynPUF

dportnoy commented 9 years ago

As noted by Lloyd Brodsky 4/22/2015 CMS DE-SynPUF is a good and statistically relevant synthetic dataset. But it's missing indicators fro dual eligibles.

lloydbrodsky commented 9 years ago

Let me note that there is presently no good public use file that I am aware of for looking at dual eligibles. That's a shame given the high level of interest in dual eligibles and great need in finding ways to handle their more severe than average problems. I know that CMS knows who its dual eligibles are, because the percentage of dual eligibles are reported out in the CMS geographic variation public use files. (See http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Geographic-Variation/GV_PUF.html)

So, why not add a field to the DE synPUF ( http://www.cms.gov/Research-Statistics-Data-and-Systems/Downloadable-Public-Use-Files/SynPUFs/DE_Syn_PUF.html ) indicating whether the beneficiary is eligible for Medicaid as well, preferably with the reason why (generally, it's because they're one of over 65, on SSDI for 2+ years, or have ESRD or ALS). Nationwide that's a bit under 20% so there shouldn't be a great privacy problem in reporting that at an individual level.

dportnoy commented 9 years ago

Created page for full use case specifications and solution: http://hhs.ddod.us/wiki/Use_Case_30

betshsu commented 9 years ago

@lloydbrodsky Can you clarify what is deficient about the Medicare-Medicaid Linked Enrollee Analytic data source (MMLEADS; https://www.cms.gov/Medicare-Medicaid-Coordination/Medicare-and-Medicaid-Coordination/Medicare-Medicaid-Coordination-Office/Analytics.html) and the Chronic Conditions Data Warehouse (https://www.ccwdata.org/web/guest/home). Is it a matter of granularity (wanting individual-level data rather than aggregated) or wanting different slices of data than presented in the MMLEADS PUF? Or an issue of entrepreneur vs researcher use?

dportnoy commented 9 years ago

@lloydbrodsky, we'd love to make some progress on this one, but need to confirm you're still engaged, since it's been about 5 months. (BTW, apologies about the long delay on our side. There was quite a backlog.)

Could you respond by Fri 10/23. Thanks!

lloydbrodsky commented 9 years ago

Mr. Portnoy, I haven't disappeared. I've been working through the issues of how to get around the limitations of the public use files. The short version is what I originally had in mind won't work and I'm working with an academic to get data out of the ResDAC that we think will. Those discussions have been going on part-time for a few months. The longer version is, well, longer. As you'll recall the original idea was to explain utilization or AHRQ PQIs by social variables out of Census. (As in amputation rate for diabetics being explained by demographics and income). (I found out a bit later there's a whole field called social epidemiology that's been doing this kind of things for a while) It would also have been nice to be able to focus on the Medicaid population only, but the geographic variations PUF just says what percentage is due to Medicaid, while the MMLEADS file just counts enrollments at the state level. At the time we were talking CMS had just announced at Health Datapalooza in May that data entrepreneurs would now have access to research identifiable files somehow with with details to arrive sometime in September. Those details have now arrived and the bottom line is that to participate you have a buy-in of almost $50K and learn a new language (details below, which didn't become available until late September). That's a problem because we've figured out we need to aggregate at a lower level of granularity, such as zip code or census tract. Besides an annual cost of $25,000 per person (sharing not allowed) + a startup costper project of $15,000. In addition there's the complexity of having to learn a new language. Data entrepreneurs overwhelmingly use open source tools (R or Python for data manipulation, MySQL or Postgres for databases). To use the CMS VRDC you need to go learn SAS or Stata, not just for statistics but also as a SQL replacement. That rather implies buying it (SAS for PCs is $9,000 – to be fair Stata is a lot less). I've used SAS and have never touched Stata. This doesn't seem to be likely to get lot of takers. The fee situation is better for academics, as they're not required to deal with the VRDC,so I've been talking with one of my classmates who is a professor in health economics. It WOULD be very nice if CMS produced deidentified public use data at the same level of granularity as Census, but it appears that would require changing the statute that requires cost recovery for the ResDAC.

 On Saturday, October 17, 2015 2:10 AM, David X Portnoy <notifications@github.com> wrote:

@lloydbrodsky, we'd love to make some progress on this one, but need to confirm you're still engaged, since it's been about 5 months. (BTW, apologies about the long delay on our side. There was quite a backlog.) Could you respond by Fri 10/23. Thanks!— Reply to this email directly or view it on GitHub.

demand-driven-open-data / ddod-intake

Identifying dual eligibles in CMS DE-SynPUF #30