auditdata is an R package to perform an audit on the quality of a table.
You can install auditdata from github using the devtools package :
devtools::install_github('MathieuMarauri/auditdata)
The main function of this package are audit_report_html_global()
, audit_report_html()
and audit_report_excel()
. They produce a document (html or excel file) containing information about the table given in input. The number of rows, of columns, of unique values and missing values are given as global information. More details are given on inidividual columns depending on the type (numeric, categorical or date).
library("auditdata")
# Generation of fake data
data <- data.frame(
cat1 = sample(month.name, 1000, replace = TRUE),
cat2 = sample(letters, 1000, replace = TRUE),
cat3 = sample(c("apple", "orange", "banana", "pear", "grapefruit", "cherry"), 1000, replace = TRUE),
num1 = runif(1000, 100, 150),
num2 = rnorm(1000, 37, 8),
num3 = rexp(1000, 2),
bool1 = sample(c(TRUE, FALSE), 1000, replace = TRUE),
date1 = seq(from = as.Date("2010-01-01"), to = as.Date("2017-01-01"), length.out = 1000),
date2 = seq(from = as.Date("2010-01-01"), to = as.Date("2017-01-01"), length.out = 1000)
)
while (sum(is.na(data) == TRUE) < (nrow(data) * ncol(data) * 10 / 100)) {
data[sample(nrow(data), 1), sample(ncol(data), 1)] <- NA
}
# Audit of the data
audit_report_html_global(data)
audit_report_html(data)