EHDEN / CdmInspection

R Package to support quality control inspection of an OMOP-CDM instance
Apache License 2.0
11 stars 16 forks source link

Implementation of other checks #3

Closed PRijnbeek closed 3 years ago

PRijnbeek commented 3 years ago
PRijnbeek commented 3 years ago

CPUs, mem, R version, list of installed packages done with benchmarkme package.

PRijnbeek commented 3 years ago

Check if all HADES packages are installed added, including message if packages are missing Return all results in list object

PRijnbeek commented 3 years ago

Check WebAPI is running added

MaximMoinat commented 3 years ago

List the topX unmapped source values per domain. This is a check for some 'low hanging fruit' to improve the mapping.

MaximMoinat commented 3 years ago

Count of concepts per vocabulary by standard, classification and non-standard.

PRijnbeek commented 3 years ago

List the topX unmapped source values per domain. This is a check for some 'low hanging fruit' to improve the mapping.

Yes, but if the source values are codes it may be less informative? Maybe we can join to source to concept and add description? Is that used?

PRijnbeek commented 3 years ago

Count of concepts per vocabulary by standard, classification and non-standard.

Not sure i follow. You mean a count of all vocabularies per domain:

Donain, vocabulary, nr patients, nr codes, standard

Standard should always be true

MaximMoinat commented 3 years ago

Yes, but if the source values are codes it may be less informative? Maybe we can join to source to concept and add description? Is that used?

Well, we need to know the source_vocabulary_id for that, and that information is not present in the event tables. Same issue btw to get the frequency for the codes in the source_to_concept_map table. e.g. if the source_code 1 is in multiple source vocabularies, then we cannot count them separately.

Not sure i follow. You mean a count of all vocabularies per domain: Donain, vocabulary, nr patients, nr codes, standard Standard should always be true

That might actually also be interesting, what target vocabularies were used per domain. However, my suggestion was simpler; just counting in the concept table (select count(*) from concept group by vocabulary_id, standard_concept). For instance, this would show whether the CPT4 vocabulary was actually loaded (this is an additional step in the vocab loading process). But could also show other anomalies with the vocabulary loading.

PRijnbeek commented 3 years ago

Well, we need to know the source_vocabulary_id for that, and that information is not present in the event tables. Same issue btw to get the frequency for the codes in the source_to_concept_map table. e.g. if the source_code 1 is in multiple source vocabularies, then we cannot count them separately.

Can you give me the query you like to see?

However, my suggestion was simpler; just counting in the concept table (select count(*) from concept group by vocabulary_id, standard_concept). For instance, this would show whether the CPT4 vocabulary was actually loaded (this is an additional step in the vocab loading process). But could also show other anomalies with the vocabulary loading.

Yes agree will add this. I now dump the vocabulary table but this is better, i can join that anyway.

PRijnbeek commented 3 years ago

Count of concepts per vocabulary by standard, classification and non-standard.

Done.

image