Closed schuemie closed 6 years ago
To address your second question, the version is in the VOCABULARY table:
SELECT *
FROM VOCABULARY
WHERE VOCABULARY_ID = 'none'
How? We already have a version number: 4.5 and 5.0. That version is for merging the Vocabularies with the right CDM.
But those are really CDM version numbers, not vocab versions. One could even argue v4.5 or v5.0 merely indicates the format of the vocab.
We could stick to the current de facto version identifiers which are the release dates (e.g. '11-March-2016'), but it is odd that we only do that for the vocab, and all other OHDSI artifacts have regular version numbers. It is certainly not intuitive to most users.
One idea is to combine the CDM version with the vocab version into one identifier, e.g. 'V5.0.AA', 'v5.0.AB', etc.
👍 for this idea. Specifically in the space of automating operations on and between OHDSI datasets, it would be very useful to be able to programmatically inspect (discover from the data) and compare (which is greater, are they incompatible) specific (semantic) vocabulary versions.
Ok. We'll cook something up.
Compatibility: That's an interesting question. Because generally the versions are 98% identical and compatible. But if you need a certain information that happens to be in the 2% you are screwed. Not sure how to get that right.
From my perspective, the most important question is whether all of the concept ids from a previous vocabulary version exist in a new version.
Based on that, the interpretation of the semantic versioning specification could be:
I agree with @schuemie that the CDM version numbers are more like format identifiers.
@aaron0browne:
The concept_ids are all preserved. Except in very rare cases (egregious errors or duplications) they never die. Such a case hasn't happened yet.
But to your three suggestions: In each release there are concepts added, removed (set to invalid, not really removed) and changed. So, unless you have a good idea your schema would not work. It's not software.
So then what is the 2% you referred to above? Concepts that are set to invalid?
Yes, those, and added ones, and changed ones. The individual concepts kind of have already a version: The valid_start_date, valid_end_date and invalid_reason. But we are talking about the Vocabulary System as a whole.
For me the most important thing is that there is an explicit version ID. For example, I don't know which version my friends at ErasmusMC or TMU are using (even though they're both on CDM v5), and until Erica taught me the trick of looking up the vocabulary_version
using vocabulary = 'none'
in the vocabulary
table, I had no way of finding out.
That being said, some updates are more profound than others. For example, from v4.4 to v4.5 (aka v5.0) the entire ICD10-to-standard (SNOMED) mapping was replaced. But that doesn't mean people shouldn't update their ETL even after a minor update. So not sure semantic versioning is needed here.
Ok. So, I hear several complaints:
Not sure what V4.4 means. We have been on 4.5/5.0 for more than 2 years.
v4.4: this confusion proves my point ;-)
You got it. Will be done.
I have felt the build version with distinct version ID of a date (e.g. V5.0 - 20160311) has worked well.
But +1 on making the version more apparent in different places. Right now when I pull from ATHENA I learn the version when I open up the VOCABULARY text file - additionally it is often hidden in tools like ATLAS and ACHILLES.
When we build a cdm, we could but the vocab version in the cdm _ source table and then expose all contents from this table on an opening page of achilles... On May 25, 2016 10:05 AM, "ericaVoss" notifications@github.com wrote:
I have felt the build version with distinct version ID of a date (e.g. V5.0 - 20160311) has worked well.
But +1 on making the version more apparent in different places. Right now when I pull from ATHENA I learn the version when I open up the VOCABULARY text file - additionally it is often hidden in tools like ATLAS and ACHILLES.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/OHDSI/OMOP-Standardized-Vocabularies/issues/5#issuecomment-221586272
@pbr6cornell - but a CDM should have the Vocabulary used to build embedded inside it - so the information is already in the VOCABULARY table. But the addition of exposing CDM_SOURCE information is a good one.
Hmmm, the cdm_source
table indeed has a vocabulary_version
field, so this is redundant with the record in the vocabulary
table. But I wouldn't trust people to fill in the cdm_source
table ;-)
@schuemie - ha, actually you are right, when we do the build we were querying the VOCAB table to populate the column.
Let's enforce that. Achilles could check and whine if the cdm_source doesn't contain that information.
so this issue can be closed now. Who are the admins?
Instead of relying on the release dates as identifiers, could we introduce vocab version numbers?
And maybe the version number can be stored somewhere in the release itself? (Maybe add a vocab_source table?)