Closed jsta closed 7 years ago
Thanks for your feature request (nice one!). I'm a bit tied up this week and will look into it next week.
I did a first implementation of the summary
library(daff)
x <- iris
x[1,1] <- 10
dd <- diff_data(x, iris)
summary(dd)
dd_sum <- summary(dd)
unclass(dd_sum)
Any further suggestions?
Very nice! Thats close to what I came up with using compareDF
:
diff_csv <- function(original_csv, hand_edit_csv){
orig_csv <- read.csv(original_csv, stringsAsFactors = FALSE)
hedit_csv <- read.csv(hand_edit_csv, stringsAsFactors = FALSE)
res <- compareDF::compare_df(hedit_csv, orig_csv, c("pagenum"))
rows_changed <- res$change_summary[3]
cell_changes <- length(grep("\\+",
unlist(res$comparison_table_diff[,3:ncol(res$comparison_table_diff)])))
percent_diff <- round(cell_changes /
length(unlist(res$comparison_table_diff[,3:ncol(res$comparison_table_diff)])) * 100, 2)
paste0(cell_changes, " cells changed; ",
rows_changed, " rows changed; ",
percent_diff, "% percent difference")
}
[1] "789 cells changed; 159 rows changed; 7.45% percent difference"
Pull #13 (accepted) modifies 'summary.data_diff' to calculate the number of changed/added/removed rows and columns:
> library(daff)
> iris2 <- cbind(iris, sl.sq=iris$Sepal.Length ^2 , prod.sl.sw=iris$Sepal.Length * iris$Sepal.Width)
> iris2$Petal.Length[14] = 10
> iris2$Petal.Width[22] = 25
> iris2 <- iris2[-10,]
> iris2 <- rbind(iris2, iris2[3:7,])
> dd <- diff_data(iris, iris2)
> summary(dd)
Data diff:
Comparison: ‘iris’ vs. ‘iris2’
# Changed Removed Added
Rows 150 --> 154 2 1 5
Columns 5 --> 7 0 0 1
It would be amazing if there was
summary
method fordiff_data
objects providing a tally of differences in terms of cells, columns, and rows (possibly % difference) etc.