OHDSI / ETL-CMS

Workproducts to ETL CMS datasets into OMOP Common Data Model
Apache License 2.0
94 stars 52 forks source link

CDM_SOURCE.VOCABULARY_VERSION #28

Open ericaVoss opened 7 years ago

ericaVoss commented 7 years ago

I think I see a copy of the SynPUF out on Postgres Public. If this is truly a build then the CDM_SOURCE.VOCABULARY_VERSION would be more informative if you pulled this information off the Vocab you ran off of:

SELECT VOCABULARY_VERSION 
FROM vocabulary 
WHERE VOCABULARY_ID = 'None';

This way you know exactly which Vocab was used in the build.

ChristopheLambert commented 7 years ago

Where do you think we should report this information? Here is the result of your query:

select VOCABULARY_VERSION from vocabulary where VOCABULARY_ID = 'None';

vocabulary_version

v5.0 08-JUN-16

ericaVoss commented 7 years ago

Yes, put that in the CDM_SOURCE.VOCABULARY_VERSION column instead of "Version 5". The answer is more informative that just "Version 5" and still contains the V5 information.

ChristopheLambert commented 7 years ago

I agree this is more informative. I'm confused what you are asking me to do. We do not ship vocabulary files as part of the ready-to-go data -- the VOCABULARY.csv file you get when you download the vocabulary contains the version information you are requesting. I don't believe any of the non-vocabulary tables contain a field for vocabulary version.

Perhaps I'm not understanding what Postgres Public is. I uploaded the CDM5 tables, excluding the vocabulary files to ftp://ftp.ohdsi.org/synpuf. Perhaps you are talking about a database somebody else loaded, and did not include a proper vocabulary table?

ericaVoss commented 7 years ago

I'm asking that the CDM Builder for SynPUF populates this CDM_SOURCE.VOCABULARY_VERSION. CDM_SOURCE.VOCABULARY_VERSION is part of the CDM not the Vocabulary. The SynPUF CDM Builder should be generating the information in this table. Instead of "Version 5" it should tell the user what Vocabulary the CDM was built with. This is important because if you build SynPUF on Vocabulary 20160311 and I build on 20160620 we'll have slightly different builds. Users of SynPUF should know what Vocabulary the CDM was built on.

I can see however on the FTP there is no CDM_SOURCE defined so I'm guessing the SynPUF builder doesn't build this table? However on the OHDSI Postgres box I think I see a copy of SynPUF with the CDM_SOURCE filled out. I already send an email to @leeevans to try to figure out what is there and how it go there. Let me get back to you when I know more.

ChristopheLambert commented 7 years ago

Ah, I see. No the SynPUF builder does not create the CDM_SOURCE table, and you are right that it should.

ericaVoss commented 7 years ago

@ChristopheLambert - learned the problem, the data I was looking at was not your SynPUF but the Lee's 1K sample SynPUF. HAHAHA - so I logged an issue for a problem that isn't technically yours. However, the CDM_SOURCE table is a good table to have. Check out our manuals if you want examples of how it should be populated: https://github.com/OHDSI/ETL-CDMBuilder/tree/master/man