Data4Democracy / drug-spending

Project to understand pharmaceutical spending, currently focused on US government programs.
73 stars 46 forks source link

Tidy, document and submit data from OpenPaymentsData.CMS.gov #45

Closed jenniferthompson closed 6 years ago

jenniferthompson commented 7 years ago

OpenPaymentsData.CMS.gov has data available on payments received by private physicians and teaching hospitals, broken down by type of expense, company, etc. We need someone to download this data, make sure it's in a tidy format, document it, and upload it for project use to our data.world repo. For details on the data contribution process, see our /datadictionaries README and data dictionary template.

sangxia commented 7 years ago

I had a quick look at the data. Right now there are data for 2014, 2015 and part of 2013 (from aug 1). The files are relatively large, 6G for a full year, but there are overlaps in identity information so if we remove those it could probably be a lot smaller.

Wondering if anyone else is looking at this?

jenniferthompson commented 7 years ago

@sangxia Not as far as I know! Let me know if you'd like me to assign you the issue.

sangxia commented 7 years ago

@jenniferthompson I'd love to help with this.

jenniferthompson commented 7 years ago

@sangxia go for it! Thank you!

rkahne commented 7 years ago

How are things going here? I'm interested in this data and am willing to help.

sangxia commented 7 years ago

An update here: For each year there are 3 tables: research agreement related payments, a general table containing all other payments, and physician ownership of manufacturer information. There is also a supplementary table containing all physician information. So far I've been mostly looking at the general table, cleaning up entity names, etc.

@rkahne Great if you can help. Would you be interested in having a look at the research payment or ownership data?

TBusen commented 7 years ago

@rkahne I'll take one of these too. Are you looking at research payment or ownership data?

rkahne commented 7 years ago

@sangxia I'm more interested in the payments data.

jenniferthompson commented 7 years ago

Just checking in to see how this is going. Anything we can help with? @sangxia @rkahne @TBusen

kimkraunz commented 7 years ago

I'd love to help too if there's anything that needs work/

TBusen commented 7 years ago

@jenniferthompson I haven't been able to get to this yet, sorry... life

kimkraunz commented 7 years ago

@TBusen Do you think you'll get to it or should I start working on it?

TBusen commented 7 years ago

@kimkraunz go ahead and start. Thanks.

jenniferthompson commented 7 years ago

I hear you on life, @TBusen :)

Thanks for jumping in, @kimkraunz! I just added you to the D4D organization; once you accept that I can add you as an assignee for this issue.

kimkraunz commented 7 years ago

I'm on it! Just familiarizing myself now

mattgawarecki commented 7 years ago

Let's make sure the assignees on this issue are accurate -- anybody there who shouldn't be, or not there who should be?

jenniferthompson commented 7 years ago

Looks good as far as I know - I tried to assign @kimkraunz yesterday but had to wait for her to be added to the org! @rkahne, are you still planning to work on this? If so we can add you as well.

kimkraunz commented 7 years ago

OK- Have taken a look at the ownership data. I need to do the following:

  1. Use the most recent version of record (if changed).
  2. Standardize capitalization of variables (much inconsistency)
  3. Clean terms_of_interest and put into tidy format. It looks like it was a text box for inputting data and there's a wide range of values. For example: some are "stock ownership" while others are "SERIES F PREFERRED STOCK".
  4. Verify data type for each variable and change as needed.

2 and 4 are the easiest (once I get feedback on capitalization). I'll do those first and then move onto 3. Should I combine 2013, 2014, and 2015? Relatively small datasets.

Also, I didn't see it but do we have keys for physicians? Thanks!

jenniferthompson commented 7 years ago

Thanks @kimkraunz! I think combining the years is a good plan - it'll make it easier to look at things over time.

We don't have physician keys yet to my knowledge - someone feel free to correct me if I missed it.

saipranava commented 7 years ago

@jenniferthompson Hello. Is this task complete, I'm hoping to work on the task where we can visualize this data.