OHDSI / ETL-CMS

Workproducts to ETL CMS datasets into OMOP Common Data Model
Apache License 2.0
96 stars 53 forks source link

My experiences using the unm-improvements branch - Apr 14 #18

Closed boxysean closed 8 years ago

boxysean commented 8 years ago

Hey folks,

First off, thank you for this resource connecting the synthetic dataset to OMOP, this is very helpful for me to evaluate how OMOP can benefit my work. This has saved me a ton of time.

Below is some unsolicited feedback after using the unm-improvements branch to generate sample patient data for my local OMOP CDM instance. I was referred to here from this discussion.

I'm sure there's lots of internal discussion over on your end, but I would suggest possibly the following to make this really useful to the general public:

Thanks again! Super helpful.

ChristopheLambert commented 8 years ago

Hi Sean,

Thanks for the feedback, and glad it was helpful. Let me respond to your feedback:

Thanks!

Christophe

pbr6cornell commented 8 years ago

Christophe, if you made a version of the dataset using your ETL, the coordinating center can host it on our amazon instance, and we can expose it via the OHDSI website. Lee Evans can help with those logistics. Thanks for your contribution, this is great!

On Thu, Apr 14, 2016 at 7:18 PM, Christophe Lambert < notifications@github.com> wrote:

Hi Sean,

Thanks for the feedback, and glad it was helpful. Let me respond to your feedback:

  • We will look into the confusing message
  • The script get_synpuf_files.py used to be in python3 -- in our branch, we converted it to 2.7 for just the reason you mentioned -- consistency. Are you sure you retrieved the unm-improvements branch? The change to 2.7 is documented in the header.
  • We didn't know how to get that file either, so we overhauled the program to directly read the OMOP vocabulary files as they come out of the box. Again, are you sure you retrieved the right branch? I can't even find a reference to that file in our branch.
  • We did not provide instructions on how to create the OMOP CDM v5 database, as we hadn't got there yet, but I agree it would be helpful to have the full soup-to-nuts instructions.
  • Great idea to have a script to run it all.
  • I would like to do release the results of running the ETL as a zip file as well. It will be quite large -- any suggestions where?

Thanks!

Christophe

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/OHDSI/ETL-CMS/issues/18#issuecomment-210200195

boxysean commented 8 years ago

Hey @ChristopheLambert, well false alarm. I was on 94540d02db59bd2ca5c0a3118702a5dcfb3990dc from master, thinking I was on unm-improvements. No wonder you were so confused, oops! :)

Looks like there's a lead as to where to put the output, excellent. I'll close the issue as most else what I said doesn't seem to apply. Thanks!

ChristopheLambert commented 8 years ago

Patrick, we will be sure to do that.

Sean, glad you reached out anyways. Let us know how it works out!

leeevans commented 8 years ago

Hi @ChristopheLambert how big is the SYNPUF CDMV5 dataset that you would like to share?

Do you have a preferred way to transfer it? ftp server? I can setup a temporary AWS S3 bucket for you to upload the dataset if needed.

You can send me a direct message on the OHDSI forum, or connect and message me on linkedIn to share the transfer connection details.

Thanks.

ChristopheLambert commented 8 years ago

Hi @leeevans, we are not finished yet, but estimate it will be 110GB uncompressed, and about 18GB compressed. SFTP would be fine.