capitalone / dataCompareR

dataCompareR is an R package that allows users to compare two datasets and view a report on the similarities and differences.
https://capitalone.github.io/dataCompareR/index.html
Other
75 stars 25 forks source link

Summary fails if tables are pulled out of named lists. #105

Open ConorIA opened 4 years ago

ConorIA commented 4 years ago

I found a bit of an edge case. Summary fails if the original data frames are passed from a named list. Maybe the table names need to be sanitized before returning the comp object? BTW, I have removed the dplyr deprecated function notices from the reprex output for clarity.

library(tibble)
library(dataCompareR)

table1 <- tribble(~A, ~B, ~C,
                   1,  2,  3,
                   2,  6,  7)

table2 <- tribble(~A, ~D, ~C,
                   1,  2, 19,
                   2,  6,  7)

lis <- list(table1 = table1, table2 = table2)

comp1 <- rCompare(table1, table2, keys = "A")
#> Running rCompare...
#> Coercing input data to data.frame

summary(comp1)
#> dataCompareR is generating the summary...
#> 
#> Data Comparison
#> ===============
#> 
#> Date comparison run: 2020-11-13 13:00:13  
#> Comparison run on R version 4.0.3 (2020-10-10)  
#> With dataCompareR version 0.1.3  
#> 
#> 
#> Meta Summary
#> ============
#> 
#> 
#> |Dataset Name |Number of Rows |Number of Columns |
#> |:------------|:--------------|:-----------------|
#> |table1       |2              |3                 |
#> |table2       |2              |3                 |
#> 
#> 
#> Variable Summary
#> ================
#> 
#> Number of columns in common: 2  
#> Number of columns only in table1: 1  
#> Number of columns only in table2: 1  
#> Number of columns with a type mismatch: 0  
#> Match keys : 1   - A
#> 
#> 
#> Columns only in table1: B  
#> Columns only in table2: D  
#> Columns in both : A, C  
#> 
#> Row Summary
#> ===========
#> 
#> Total number of rows read from table1: 2  
#> Total number of rows read from table2: 2    
#> Number of rows in common: 2  
#> Number of rows dropped from table1: 0  
#> Number of rows dropped from  table2: 0  
#> 
#> 
#> Data Values Comparison Summary
#> ==============================
#> 
#> Number of columns compared with ALL rows equal: 0  
#> Number of columns compared with SOME rows unequal: 1  
#> Number of columns with missing value differences: 0  
#> 
#> 
#> 
#> Summary of columns with some rows unequal: 
#> 
#> 
#> 
#> |Column |Type (in table1) |Type (in table2) | # differences|Max difference | # NAs|
#> |:------|:----------------|:----------------|-------------:|:--------------|-----:|
#> |C      |double           |double           |             1|16             |     0|
#> 
#> 
#> 
#> Unequal column details
#> ======================
#> 
#> 
#> 
#> #### Column -  C
#> 
#> 
#> 
#> |  A| C (table1)| C (table2)|Type (table1) |Type (table2) | Difference|
#> |--:|----------:|----------:|:-------------|:-------------|----------:|
#> |  1|          3|         19|double        |double        |        -16|

comp2 <- rCompare(lis$table1, lis$table2, keys = "A")
#> Running rCompare...
#> Coercing input data to data.frame

summary(comp2)
#> dataCompareR is generating the summary...
#> Warning in matrix(c(object$meta$A$name, object$meta$A$rows,
#> object$meta$A$cols, : data length [10] is not a sub-multiple or multiple of the
#> number of columns [3]
#> Error in names(x) <- value: 'names' attribute [7] must be the same length as the vector [3]

Created on 2020-11-13 by the reprex package (v0.3.0)

sajohnston commented 3 years ago

Hi @ConorIA - I'm very sorry for the delay in response.

Thank you for raising this issue. We have included it in our project roadmap to resolve by the end of this year.