OHDSI / Vocabulary-v5.0

Build process for the OHDSI Standardized Vocabularies. Currently not available as independent release.
The Unlicense
214 stars 75 forks source link

Concept Id 0 should be a standard concept #597

Open don-torok opened 2 years ago

don-torok commented 2 years ago

Concept ID 0 is valid for any column holding a concept id. As such it should be classified as a standard concept, otherwise all the CDM documentation that says standard concept only would have to change to include the exception of concept id 0.

cgreich commented 2 years ago

That's correct. I'm surprised nobody has noticed before. Maybe it changed at some point.

Can you fix, @mik-ohdsi?

Jake-Gillberg commented 2 years ago

Anyone know if this change would impact data quality dashboard metrics?

don-torok commented 2 years ago

No, this will not impact data quality dashboard metrics. Current test for standard concept explicitly excludes concept ID zero. Which explains why DQD does not complain currently about concept ID zero not being a standard concept. Test for standard concept

Jake-Gillberg commented 2 years ago

Great, thanks for pulling up the code!

mik-ohdsi commented 2 years ago

Okay, @cgreich , what is the plan? Just introduce a new concept to the OMOP extension vocabulary with concept_id = 0 and maybe concept_code = "OMOP0"?

Alexdavv commented 2 years ago

Because we already have one, let's just make it Standard and think whether it should be related to any vocabulary and concept class.

cgreich commented 2 years ago

Can we, please?

Alexdavv commented 2 years ago

@hardhouse Let’s have a look what service vocabularies and concept classes we use in general and how this specific concept fits in

mik-ohdsi commented 2 years ago

Hi @hardhouse and @Alexdavv - this one is still Non-Standard. Do we have a plan around it?

Alexdavv commented 1 year ago

I'm advocating for the following changes for concept_id = 0:

@cgreich @hardhouse Thoughts?

cgreich commented 1 year ago

Sure. What problem are we solving with this?

Alexdavv commented 1 year ago

Concept ID 0 is valid for any column holding a concept id. As such it should be classified as a standard concept, otherwise all the CDM documentation that says standard concept only would have to change to include the exception of concept id 0.

dimshitc commented 1 year ago

No, just leave all attributes as they are, except making standard_concept ='S', @Alexdavv vocabulary_id = 'CDM' describes the structure of CDM as you mentioned, and 0 is not a structure of CDM obviously.

Alexdavv commented 1 year ago

No, just leave all attributes as they are

And the explanation is…? Look, the users anyway should shop hardcoding this concept because it becomes a full real one. That’s why these ugly placeholders are not anymore a reliable solution for a Standard concept.

The concepts are not the structure in a straightforward way, but they’re a content that could also be considered a part of the structure because they’re standardized, and in CDM we record a concept_id, not a name.

Also, this specific one is unique. It’s anyway a metadata, and somewhat still a part of the model (because it will stay hardcoded in the tables for a while). If we introduce a “Concept” concept_class in the “CDM” vocabulary, the logical issue is solved.

Maybe we could find other better options but let’s clean it up…

cgreich commented 1 year ago

If we introduce a “Concept” concept_class in the “CDM” vocabulary, the logical issue is solved.

True, except we are lacking a use case. Relational database models don't refer to themselves. They can't even use their own artefacts as values. All references are at the model level, not at the data level. Folks are trying to build an information model out of them, and for them we introduced these funny concepts for tables and fields. And we are using it, awkwardly, to define foreign keys without pre-defining the foreign table. Ugly, but at least we have a reason.

Now you want to extend that even further. Do we need that?

Alexdavv commented 1 year ago

Now you want to extend that even further. Do we need that?

...Yes?

The metadata is not exactly the content, but one of the ways to represent the model. Can be easily filtered out if any.

While I think, it's more important to keep the content clean avoiding these ugly placeholders that are currently used for the only concept.

cgreich commented 1 year ago

That's an argument for improving the logical integrity of the model, which is theoretical, unless you have a use case. What is the analytical use case where you would ever use that vocabulary and concept class? Certainly not "give me all records where the vocabulary_id='CDM'. Everybody would query "Give me all records where *_concept_id=0', which they can do today.

ekorchmar commented 1 year ago

I agree with @cgreich completely on this one. We do not want to even risk breaking existing behavior in the name of purity. Consider USAGI as just one of many examples: it never builds a mapping to 0 automatically, because it is not a standard concept. It would need to be rewritten to explicitly check for concept_id ≠ 0. And every single tool that relies on STANDARD_CONCEPT field would need to be updated.

In short, keeping concept_id = 0 non-standard has the same justification as why most (or all?) programming languages default evaluation of null/NaN/None to False.

Alexdavv commented 1 year ago

Our discussion went in the wrong way. @ekorchmar Christian was never advocating against making it Standard. The further discussion was about the concept_class and vocabulary_id. But I see your point.

@cgreich @don-torok Do you want to reconsider?

don-torok commented 1 year ago

It should be a standard concept since zero is valid anywhere a standard concept is expected. However, if the change is going to break existing software, then leave as is and have CDM documentation say that zero is an accepted concept id for fields that otherwise should only hold standard concepts.

cgreich commented 1 year ago

Let's ask the Forum: "how do folks check for existence of standard concepts? Are you: