Implementation of other checks

PRijnbeek commented 3 years ago

[x] Execution of short and longer running queries to test the performance of the system. This information is useful for the SME to provide further guidance on optimizing the infrastructure.
[x] Checks on the number of CPUs, memory available in R.
[x] Extract the versions of all installed R packages, checks if core HADES packages are installed.
[x] Check if ATLAS is installed and WebAPI is running
[x] Check if Achilles results are available in ATLAS.
[x] Extraction of CDM_Source table
[x] Schema Validation -> removed is in DQD
[x] Count of concepts per vocabulary by standard, classification and non-standard.
[x] Show top X=10 unmapped codes in each clinical domain
[x] Read Data Density Report
[x] Take counts from Achilles instead of CatalogExport
[x] Concept counts round up to nearest 100 top 25 mapped codes per domain.

PRijnbeek commented 3 years ago

CPUs, mem, R version, list of installed packages done with benchmarkme package.

PRijnbeek commented 3 years ago

Check if all HADES packages are installed added, including message if packages are missing Return all results in list object

PRijnbeek commented 3 years ago

Check WebAPI is running added

MaximMoinat commented 3 years ago

List the topX unmapped source values per domain. This is a check for some 'low hanging fruit' to improve the mapping.

MaximMoinat commented 3 years ago

Count of concepts per vocabulary by standard, classification and non-standard.

PRijnbeek commented 3 years ago

List the topX unmapped source values per domain. This is a check for some 'low hanging fruit' to improve the mapping.

Yes, but if the source values are codes it may be less informative? Maybe we can join to source to concept and add description? Is that used?

PRijnbeek commented 3 years ago

Count of concepts per vocabulary by standard, classification and non-standard.

Not sure i follow. You mean a count of all vocabularies per domain:

Donain, vocabulary, nr patients, nr codes, standard

Standard should always be true

MaximMoinat commented 3 years ago

Yes, but if the source values are codes it may be less informative? Maybe we can join to source to concept and add description? Is that used?

Well, we need to know the source_vocabulary_id for that, and that information is not present in the event tables. Same issue btw to get the frequency for the codes in the source_to_concept_map table. e.g. if the source_code 1 is in multiple source vocabularies, then we cannot count them separately.

Not sure i follow. You mean a count of all vocabularies per domain: Donain, vocabulary, nr patients, nr codes, standard Standard should always be true

That might actually also be interesting, what target vocabularies were used per domain. However, my suggestion was simpler; just counting in the concept table (select count(*) from concept group by vocabulary_id, standard_concept). For instance, this would show whether the CPT4 vocabulary was actually loaded (this is an additional step in the vocab loading process). But could also show other anomalies with the vocabulary loading.

PRijnbeek commented 3 years ago

Well, we need to know the source_vocabulary_id for that, and that information is not present in the event tables. Same issue btw to get the frequency for the codes in the source_to_concept_map table. e.g. if the source_code 1 is in multiple source vocabularies, then we cannot count them separately.

Can you give me the query you like to see?

However, my suggestion was simpler; just counting in the concept table (select count(*) from concept group by vocabulary_id, standard_concept). For instance, this would show whether the CPT4 vocabulary was actually loaded (this is an additional step in the vocab loading process). But could also show other anomalies with the vocabulary loading.

Yes agree will add this. I now dump the vocabulary table but this is better, i can join that anyway.

PRijnbeek commented 3 years ago

Count of concepts per vocabulary by standard, classification and non-standard.

Done.

EHDEN / CdmInspection

Implementation of other checks #3