Closed boxysean closed 8 years ago
Hi Sean,
Thanks for the feedback, and glad it was helpful. Let me respond to your feedback:
Thanks!
Christophe
Christophe, if you made a version of the dataset using your ETL, the coordinating center can host it on our amazon instance, and we can expose it via the OHDSI website. Lee Evans can help with those logistics. Thanks for your contribution, this is great!
On Thu, Apr 14, 2016 at 7:18 PM, Christophe Lambert < notifications@github.com> wrote:
Hi Sean,
Thanks for the feedback, and glad it was helpful. Let me respond to your feedback:
- We will look into the confusing message
- The script get_synpuf_files.py used to be in python3 -- in our branch, we converted it to 2.7 for just the reason you mentioned -- consistency. Are you sure you retrieved the unm-improvements branch? The change to 2.7 is documented in the header.
- We didn't know how to get that file either, so we overhauled the program to directly read the OMOP vocabulary files as they come out of the box. Again, are you sure you retrieved the right branch? I can't even find a reference to that file in our branch.
- We did not provide instructions on how to create the OMOP CDM v5 database, as we hadn't got there yet, but I agree it would be helpful to have the full soup-to-nuts instructions.
- Great idea to have a script to run it all.
- I would like to do release the results of running the ETL as a zip file as well. It will be quite large -- any suggestions where?
Thanks!
Christophe
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/OHDSI/ETL-CMS/issues/18#issuecomment-210200195
Hey @ChristopheLambert, well false alarm. I was on 94540d02db59bd2ca5c0a3118702a5dcfb3990dc from master
, thinking I was on unm-improvements
. No wonder you were so confused, oops! :)
Looks like there's a lead as to where to put the output, excellent. I'll close the issue as most else what I said doesn't seem to apply. Thanks!
Patrick, we will be sure to do that.
Sean, glad you reached out anyways. Let us know how it works out!
Hi @ChristopheLambert how big is the SYNPUF CDMV5 dataset that you would like to share?
Do you have a preferred way to transfer it? ftp server? I can setup a temporary AWS S3 bucket for you to upload the dataset if needed.
You can send me a direct message on the OHDSI forum, or connect and message me on linkedIn to share the transfer connection details.
Thanks.
Hi @leeevans, we are not finished yet, but estimate it will be 110GB uncompressed, and about 18GB compressed. SFTP would be fine.
Hey folks,
First off, thank you for this resource connecting the synthetic dataset to OMOP, this is very helpful for me to evaluate how OMOP can benefit my work. This has saved me a ton of time.
Below is some unsolicited feedback after using the
unm-improvements
branch to generate sample patient data for my local OMOP CDM instance. I was referred to here from this discussion.get_synpuf_files.py
utility was confusing. The README was correct, so doingpython output 4 20
worked, but the feedback from the tool was telling me otherwise:output
was theINPUT_DIRECTORY
,4
was theOUTPUT_DIRECTORY
, and20
was theSAMPLE_RANGE
.get_synpuf_files.py
is written in python3, butCMS_SynPuf_ETL_CDM_v5.py
is python2. It seems like you folks are thinking about which to use, but to me, consistency within a single repo is the most important trait.omop_vocab_xref_0723.txt
, so I ended up commenting out the section that builds the mapping xref.I'm sure there's lots of internal discussion over on your end, but I would suggest possibly the following to make this really useful to the general public:
Thanks again! Super helpful.