OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
219 stars 75 forks source link

Latest Vocabulary Version #373

Open PRijnbeek opened 7 years ago

PRijnbeek commented 7 years ago

How should I interpret the fact that the latest release notes are from June 20th while when I get the latest vocabulary from Athena there are later vocabulary updates?

Is the vocabulary version indeed now the date that i download it (see other posted issue)? I find it hard to understand what the changes are, how this would affect my ETL, how it would affect comparison of results in a network study, etc?!

I am sure we all struggle(d) with this. How do people currently manage vocabulary updates in there institutes?

@pbr6cornell @ericaVoss On what ground do you decide to update the vocabulary? How is this quality control procedure done at JnJ?

ericaVoss commented 7 years ago

@cgreich - I agree ATHENA should tell you the release number of the Vocabulary without you having to download the whole thing to figure it out. And release notes should match, titled YYYYMMDD so they order right.

@PRijnbeek the way we manage it here is we try to only review the Vocabulary 2 times a year (January / July). When we review the Vocabulary I have a program that compares it to the previous Vocab we were using to characterize the differences. Additionally I will review the GitHub for any open issues to see if they still exist or not. Based on those two things I decide if we want to move to the next Vocabulary or not.

Sometimes we notice issues with the Vocabulary that causes us to adopt Vocab out of cycle. Like this issue caused us to adopt a new Vocabulary last month: https://github.com/OHDSI/Vocabulary-v5.0/issues/88.

PRijnbeek commented 7 years ago

You can already see at the ATHENA website there are vocabs updated in October, but you do not know what is updated since there is no recent release note.

@ericaVoss Yes the two year cycle is also what I like us to do or indeed hotfixes if relevant. The review with the program you need to tell me a bit more about, i was thinking about a tool like that as well. I see at the server that the vocabs are loaded in their own database this would allow you to run a tool against the two databases (=vocab versions) to spot the differences. The question is what are the rules applied in your tool?

@cgreich It would be great if for each vocab update a kind of report is created that ticks those quality rules off or shows the results of the diff. I think it makes no sense all CDM owners develop their own quality tools or do extensive testing.

You like to know per vocabulary source at least these things I think (maybe more):

  1. How many concept_ids are added (which?)
  2. How many relationships are changes (which?)
  3. New vocabulary sources need a bit more background/meta data.
  4. etc

Personally, I find the current release notes a bit cryptic and hard to understand if you are not in the vocab team. I do think that all database custodians need to fully understand these vocab changes because it can have high impact on study results if there are missing links in the hierarchy for example as in #88.

cgreich commented 7 years ago

Peter:

Everything you are saying is correct. We just haven't had the chance and resources to fix it. Herding vocabularies is an ungrateful job nobody wants to fund, and if it works everything takes it for granted, and if not it's annoying. I am not being defensive, just explaining the situation.

Let me bring it up with the team.

cgreich commented 7 years ago

Peter:

One more thing:

Many of the things you have exist, but not automated. So, for each vocab update there are automated QA scripts and scripts for comparison with the previous version. But you can't see that.