EHDEN / NetworkDashboards

Dashboards showing intrinsic meta data for the OMOP-CDM databases in the EHDEN data network
12 stars 2 forks source link

Add checks on CatalogueExport file #236

Open MaximMoinat opened 2 years ago

MaximMoinat commented 2 years ago

We are seeing some outliers in the DatabaseCatalogue due to data quality issues. For example, a negative Cumulative Observation Time (Observation Period). This makes the data visualisations hard to interpret.

One way to improve this would be to include checks on the file imported for the database dashboard. If e.g. negative times are found, then the user should get a message back. Some checks will only trigger a warning and others result in the upload being rejected.

Note: data quality checks are also being done by the DQD. We should not try to redesign that, but only focus on data issues that would give problems with the Dashboard visualisations.

aspedrosa commented 2 years ago

On this tool, we are assuming that all data coming from CatalogueExport is correct, if it is not, shouldn't this be corrected at the data generation level there (CatalogueExport)?

MaximMoinat commented 2 years ago

We could indeed do a clean up in the R script of the CatalogueExport. However, there we do not know what kind of visualisations are made and what outliers would create issues. So ideally, each (new) visualisation has some expectations on in what ranges it expects data. This would then have to be implemented on the NetworkDashboard side.

MaximMoinat commented 2 years ago

Your solution as proposed in issue #232 would also work here.

Then, we can move these data checks and provide warnings when generating the data in the CatalogueExport. Still, it would be really nice if the NetworkDashboard also gives a warning/error when uploading unexpected data.