Open don-torok opened 2 years ago
That's correct. I'm surprised nobody has noticed before. Maybe it changed at some point.
Can you fix, @mik-ohdsi?
Anyone know if this change would impact data quality dashboard metrics?
No, this will not impact data quality dashboard metrics. Current test for standard concept explicitly excludes concept ID zero. Which explains why DQD does not complain currently about concept ID zero not being a standard concept. Test for standard concept
Great, thanks for pulling up the code!
Okay, @cgreich , what is the plan? Just introduce a new concept to the OMOP extension vocabulary with concept_id = 0 and maybe concept_code = "OMOP0"?
Because we already have one, let's just make it Standard and think whether it should be related to any vocabulary and concept class.
Can we, please?
@hardhouse Let’s have a look what service vocabularies and concept classes we use in general and how this specific concept fits in
Hi @hardhouse and @Alexdavv - this one is still Non-Standard. Do we have a plan around it?
I'm advocating for the following changes for concept_id = 0:
@cgreich @hardhouse Thoughts?
Sure. What problem are we solving with this?
Concept ID 0 is valid for any column holding a concept id. As such it should be classified as a standard concept, otherwise all the CDM documentation that says standard concept only would have to change to include the exception of concept id 0.
No, just leave all attributes as they are, except making standard_concept ='S', @Alexdavv vocabulary_id = 'CDM' describes the structure of CDM as you mentioned, and 0 is not a structure of CDM obviously.
No, just leave all attributes as they are
And the explanation is…? Look, the users anyway should shop hardcoding this concept because it becomes a full real one. That’s why these ugly placeholders are not anymore a reliable solution for a Standard concept.
The concepts are not the structure in a straightforward way, but they’re a content that could also be considered a part of the structure because they’re standardized, and in CDM we record a concept_id, not a name.
Also, this specific one is unique. It’s anyway a metadata, and somewhat still a part of the model (because it will stay hardcoded in the tables for a while). If we introduce a “Concept” concept_class in the “CDM” vocabulary, the logical issue is solved.
Maybe we could find other better options but let’s clean it up…
If we introduce a “Concept” concept_class in the “CDM” vocabulary, the logical issue is solved.
True, except we are lacking a use case. Relational database models don't refer to themselves. They can't even use their own artefacts as values. All references are at the model level, not at the data level. Folks are trying to build an information model out of them, and for them we introduced these funny concepts for tables and fields. And we are using it, awkwardly, to define foreign keys without pre-defining the foreign table. Ugly, but at least we have a reason.
Now you want to extend that even further. Do we need that?
Now you want to extend that even further. Do we need that?
...Yes?
The metadata is not exactly the content, but one of the ways to represent the model. Can be easily filtered out if any.
While I think, it's more important to keep the content clean avoiding these ugly placeholders that are currently used for the only concept.
That's an argument for improving the logical integrity of the model, which is theoretical, unless you have a use case. What is the analytical use case where you would ever use that vocabulary and concept class? Certainly not "give me all records where the vocabulary_id='CDM'. Everybody would query "Give me all records where *_concept_id=0', which they can do today.
I agree with @cgreich completely on this one. We do not want to even risk breaking existing behavior in the name of purity. Consider USAGI as just one of many examples: it never builds a mapping to 0 automatically, because it is not a standard concept. It would need to be rewritten to explicitly check for concept_id ≠ 0. And every single tool that relies on STANDARD_CONCEPT field would need to be updated.
In short, keeping concept_id = 0 non-standard has the same justification as why most (or all?) programming languages default evaluation of null/NaN/None to False.
Our discussion went in the wrong way. @ekorchmar Christian was never advocating against making it Standard. The further discussion was about the concept_class and vocabulary_id. But I see your point.
@cgreich @don-torok Do you want to reconsider?
It should be a standard concept since zero is valid anywhere a standard concept is expected. However, if the change is going to break existing software, then leave as is and have CDM documentation say that zero is an accepted concept id for fields that otherwise should only hold standard concepts.
Let's ask the Forum: "how do folks check for existence of standard concepts? Are you:
Concept ID 0 is valid for any column holding a concept id. As such it should be classified as a standard concept, otherwise all the CDM documentation that says standard concept only would have to change to include the exception of concept id 0.