-
As underpass shifts to more focus on data quality, a document needs to be created with data quality goals. There are two types of data quality analysis. The 1st is what can be done in a minute time fr…
-
When creating a csv file for upload in the register, different data processing tools uses different strings to denote missing values. To make the upload more robust, it would be great if the following…
-
- numbers per arm similar
- total people
- os similar across arm
other things
-
Depends on #262 and to a lesser extent #266
# Problem Statement
Without manual analysis, it's hard to know what changed week to week in our data marts.
# Criteria for Success
NDC Description mart
-…
-
Create a script to display the summary of objects with information as:
- Table Name
- Number of rows
- Size of table
- Avg rows daily
Similar additional information that will help for data qual…
-
Thanks a lot for your work and the datacontract-cli tool. We are evaluating it for the use in our data platform developed in GCP. I am reporting multiple findings / issues when working with BigQuery t…
-
The code to do this is very repetitive right now. It seems to me that this could be automated into a function like this:
- **Inputs**
- A data set that has already been wrangled (and ideally als…
-
OBIS is keen to see the GBIF validator (and hence the pipelines) flag records according to OBIS rules.
This issue serves as a placeholder to capture those requirements, so GBIF can explore what can b…
-
Overview of Amazon Deequ, Apache Griffin, etc.
Добавить про то как это встроить в Data пайплайн, упомянуть что можно свои тулы написать, должно перекликаться с лекцией по DWH
-
its very difficult to asses data quality in this case. however, for any usable implementation i will need some checks. therefore condition number, rank and singular values of the core matrices are imp…