Closed jzadra closed 4 years ago
@jzadra - What's you expected behaviour if you don't want to compare any grouping column? Can you give a representative example?
@alexsanjoseph I was hoping to compare each cell's contents based on their position. I am looking at an equal dimension table with what should be identical cells in every respect.
However I think I can see why in most cases with unequal dim tables you'd need to have a key / group var to limit the scope of where comparisons are made, but this assumes that there is at least one variable that has not changed between the two dfs. And in fact if all the vars changed I can see that there would be no basis for comparison because you wouldn't know which row was supposed to be compared to which other row.
My example:
require(tidyverse)
#> Loading required package: tidyverse
require(compareDF)
#> Loading required package: compareDF
mtcars1 <- mtcars %>% rownames_to_column("car")
mtcars2 <- mtcars1 %>% mutate(gear = ifelse(gear == 3, 2, gear))
compare_df(mtcars1, mtcars2)
#> Error in check_if_comparable(both_tables$df_new, both_tables$df_old, group_col, : argument "group_col" is missing, with no default
compare_df(mtcars1, mtcars2, group_col = "car")$comparison_df
#> Creating comparison table...
#> car chng_type mpg cyl disp hp drat wt qsec vs am gear
#> 1 AMC Javelin + 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3
#> 2 AMC Javelin - 15.2 8 304.0 150 3.15 3.435 17.30 0 0 2
#> 3 Cadillac Fleetwood + 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3
#> 4 Cadillac Fleetwood - 10.4 8 472.0 205 2.93 5.250 17.98 0 0 2
#> 5 Camaro Z28 + 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3
#> 6 Camaro Z28 - 13.3 8 350.0 245 3.73 3.840 15.41 0 0 2
#> 7 Chrysler Imperial + 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3
#> 8 Chrysler Imperial - 14.7 8 440.0 230 3.23 5.345 17.42 0 0 2
#> 9 Dodge Challenger + 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3
#> 10 Dodge Challenger - 15.5 8 318.0 150 2.76 3.520 16.87 0 0 2
#> 11 Duster 360 + 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3
#> 12 Duster 360 - 14.3 8 360.0 245 3.21 3.570 15.84 0 0 2
#> 13 Hornet 4 Drive + 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
#> 14 Hornet 4 Drive - 21.4 6 258.0 110 3.08 3.215 19.44 1 0 2
#> 15 Hornet Sportabout + 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3
#> 16 Hornet Sportabout - 18.7 8 360.0 175 3.15 3.440 17.02 0 0 2
#> 17 Lincoln Continental + 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3
#> 18 Lincoln Continental - 10.4 8 460.0 215 3.00 5.424 17.82 0 0 2
#> 19 Merc 450SE + 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3
#> 20 Merc 450SE - 16.4 8 275.8 180 3.07 4.070 17.40 0 0 2
#> 21 Merc 450SL + 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3
#> 22 Merc 450SL - 17.3 8 275.8 180 3.07 3.730 17.60 0 0 2
#> 23 Merc 450SLC + 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3
#> 24 Merc 450SLC - 15.2 8 275.8 180 3.07 3.780 18.00 0 0 2
#> 25 Pontiac Firebird + 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3
#> 26 Pontiac Firebird - 19.2 8 400.0 175 3.08 3.845 17.05 0 0 2
#> 27 Toyota Corona + 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3
#> 28 Toyota Corona - 21.5 4 120.1 97 3.70 2.465 20.01 1 0 2
#> 29 Valiant + 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
#> 30 Valiant - 18.1 6 225.0 105 2.76 3.460 20.22 1 0 2
#> carb
#> 1 2
#> 2 2
#> 3 4
#> 4 4
#> 5 4
#> 6 4
#> 7 4
#> 8 4
#> 9 2
#> 10 2
#> 11 4
#> 12 4
#> 13 1
#> 14 1
#> 15 2
#> 16 2
#> 17 4
#> 18 4
#> 19 3
#> 20 3
#> 21 3
#> 22 3
#> 23 3
#> 24 3
#> 25 2
#> 26 2
#> 27 1
#> 28 1
#> 29 1
#> 30 1
Would it make sense to have an option to compare by position, where essentially the group is row_number()
and it would perhaps be the default for data frames of the same dimension and with the same col types?
mtcars1 <- mtcars1 %>% mutate(rownum = row_number())
mtcars2 <- mtcars2 %>% mutate(rownum = row_number())
compare_df(mtcars1, mtcars2, group_col = "rownum")$comparison_df
#> Creating comparison table...
#> rownum chng_type car mpg cyl disp hp drat wt qsec vs
#> 1 4 + Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1
#> 2 4 - Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1
#> 3 5 + Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0
#> 4 5 - Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0
#> 5 6 + Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1
#> 6 6 - Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1
#> 7 7 + Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0
#> 8 7 - Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0
#> 9 12 + Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0
#> 10 12 - Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0
#> 11 13 + Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0
#> 12 13 - Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0
#> 13 14 + Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0
#> 14 14 - Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0
#> 15 15 + Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0
#> 16 15 - Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0
#> 17 16 + Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0
#> 18 16 - Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0
#> 19 17 + Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0
#> 20 17 - Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0
#> 21 21 + Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1
#> 22 21 - Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1
#> 23 22 + Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0
#> 24 22 - Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0
#> 25 23 + AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0
#> 26 23 - AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0
#> 27 24 + Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0
#> 28 24 - Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0
#> 29 25 + Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0
#> 30 25 - Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0
#> am gear carb
#> 1 0 3 1
#> 2 0 2 1
#> 3 0 3 2
#> 4 0 2 2
#> 5 0 3 1
#> 6 0 2 1
#> 7 0 3 4
#> 8 0 2 4
#> 9 0 3 3
#> 10 0 2 3
#> 11 0 3 3
#> 12 0 2 3
#> 13 0 3 3
#> 14 0 2 3
#> 15 0 3 4
#> 16 0 2 4
#> 17 0 3 4
#> 18 0 2 4
#> 19 0 3 4
#> 20 0 2 4
#> 21 0 3 1
#> 22 0 2 1
#> 23 0 3 2
#> 24 0 2 2
#> 25 0 3 2
#> 26 0 2 2
#> 27 0 3 4
#> 28 0 2 4
#> 29 0 3 2
#> 30 0 2 2
Created on 2020-04-01 by the reprex package (v0.3.0)
Have you solved the problem by adding the row number? That is the expected behavior right?
I agree row number could be a smart default in case the dimensions for the data frames are the same. I will try to add this as a feature in an upcoming release :)
Yes adding row number and grouping by that works. See the second chunk above.
Cool - Adding this as a potential enhancement
Fixed in latest release
Thanks!
Getting this error when trying to compare two tibbles. I do not want to group by anything, but rather compare the entirety of both dfs.