alexsanjoseph / compareDF

R Tool to compare two data.frames
Other
93 stars 17 forks source link

Error in compare_df(): argument "group_col" is missing, with no default #31

Closed jzadra closed 4 years ago

jzadra commented 4 years ago

Getting this error when trying to compare two tibbles. I do not want to group by anything, but rather compare the entirety of both dfs.

alexsanjoseph commented 4 years ago

@jzadra - What's you expected behaviour if you don't want to compare any grouping column? Can you give a representative example?

jzadra commented 4 years ago

@alexsanjoseph I was hoping to compare each cell's contents based on their position. I am looking at an equal dimension table with what should be identical cells in every respect.

However I think I can see why in most cases with unequal dim tables you'd need to have a key / group var to limit the scope of where comparisons are made, but this assumes that there is at least one variable that has not changed between the two dfs. And in fact if all the vars changed I can see that there would be no basis for comparison because you wouldn't know which row was supposed to be compared to which other row.

My example:

require(tidyverse)
#> Loading required package: tidyverse
require(compareDF)
#> Loading required package: compareDF

mtcars1 <- mtcars %>% rownames_to_column("car")

mtcars2 <- mtcars1 %>% mutate(gear = ifelse(gear == 3, 2, gear))

compare_df(mtcars1, mtcars2)
#> Error in check_if_comparable(both_tables$df_new, both_tables$df_old, group_col, : argument "group_col" is missing, with no default

compare_df(mtcars1, mtcars2, group_col = "car")$comparison_df
#> Creating comparison table...
#>                    car chng_type  mpg cyl  disp  hp drat    wt  qsec vs am gear
#> 1          AMC Javelin         + 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3
#> 2          AMC Javelin         - 15.2   8 304.0 150 3.15 3.435 17.30  0  0    2
#> 3   Cadillac Fleetwood         + 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3
#> 4   Cadillac Fleetwood         - 10.4   8 472.0 205 2.93 5.250 17.98  0  0    2
#> 5           Camaro Z28         + 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3
#> 6           Camaro Z28         - 13.3   8 350.0 245 3.73 3.840 15.41  0  0    2
#> 7    Chrysler Imperial         + 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3
#> 8    Chrysler Imperial         - 14.7   8 440.0 230 3.23 5.345 17.42  0  0    2
#> 9     Dodge Challenger         + 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3
#> 10    Dodge Challenger         - 15.5   8 318.0 150 2.76 3.520 16.87  0  0    2
#> 11          Duster 360         + 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3
#> 12          Duster 360         - 14.3   8 360.0 245 3.21 3.570 15.84  0  0    2
#> 13      Hornet 4 Drive         + 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3
#> 14      Hornet 4 Drive         - 21.4   6 258.0 110 3.08 3.215 19.44  1  0    2
#> 15   Hornet Sportabout         + 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3
#> 16   Hornet Sportabout         - 18.7   8 360.0 175 3.15 3.440 17.02  0  0    2
#> 17 Lincoln Continental         + 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3
#> 18 Lincoln Continental         - 10.4   8 460.0 215 3.00 5.424 17.82  0  0    2
#> 19          Merc 450SE         + 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3
#> 20          Merc 450SE         - 16.4   8 275.8 180 3.07 4.070 17.40  0  0    2
#> 21          Merc 450SL         + 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3
#> 22          Merc 450SL         - 17.3   8 275.8 180 3.07 3.730 17.60  0  0    2
#> 23         Merc 450SLC         + 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3
#> 24         Merc 450SLC         - 15.2   8 275.8 180 3.07 3.780 18.00  0  0    2
#> 25    Pontiac Firebird         + 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3
#> 26    Pontiac Firebird         - 19.2   8 400.0 175 3.08 3.845 17.05  0  0    2
#> 27       Toyota Corona         + 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3
#> 28       Toyota Corona         - 21.5   4 120.1  97 3.70 2.465 20.01  1  0    2
#> 29             Valiant         + 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3
#> 30             Valiant         - 18.1   6 225.0 105 2.76 3.460 20.22  1  0    2
#>    carb
#> 1     2
#> 2     2
#> 3     4
#> 4     4
#> 5     4
#> 6     4
#> 7     4
#> 8     4
#> 9     2
#> 10    2
#> 11    4
#> 12    4
#> 13    1
#> 14    1
#> 15    2
#> 16    2
#> 17    4
#> 18    4
#> 19    3
#> 20    3
#> 21    3
#> 22    3
#> 23    3
#> 24    3
#> 25    2
#> 26    2
#> 27    1
#> 28    1
#> 29    1
#> 30    1

Would it make sense to have an option to compare by position, where essentially the group is row_number() and it would perhaps be the default for data frames of the same dimension and with the same col types?

mtcars1 <- mtcars1 %>% mutate(rownum = row_number())
mtcars2 <- mtcars2 %>% mutate(rownum = row_number())

compare_df(mtcars1, mtcars2, group_col = "rownum")$comparison_df
#> Creating comparison table...
#>    rownum chng_type                 car  mpg cyl  disp  hp drat    wt  qsec vs
#> 1       4         +      Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1
#> 2       4         -      Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1
#> 3       5         +   Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0
#> 4       5         -   Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0
#> 5       6         +             Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1
#> 6       6         -             Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1
#> 7       7         +          Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0
#> 8       7         -          Duster 360 14.3   8 360.0 245 3.21 3.570 15.84  0
#> 9      12         +          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0
#> 10     12         -          Merc 450SE 16.4   8 275.8 180 3.07 4.070 17.40  0
#> 11     13         +          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0
#> 12     13         -          Merc 450SL 17.3   8 275.8 180 3.07 3.730 17.60  0
#> 13     14         +         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0
#> 14     14         -         Merc 450SLC 15.2   8 275.8 180 3.07 3.780 18.00  0
#> 15     15         +  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0
#> 16     15         -  Cadillac Fleetwood 10.4   8 472.0 205 2.93 5.250 17.98  0
#> 17     16         + Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0
#> 18     16         - Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0
#> 19     17         +   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0
#> 20     17         -   Chrysler Imperial 14.7   8 440.0 230 3.23 5.345 17.42  0
#> 21     21         +       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1
#> 22     21         -       Toyota Corona 21.5   4 120.1  97 3.70 2.465 20.01  1
#> 23     22         +    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0
#> 24     22         -    Dodge Challenger 15.5   8 318.0 150 2.76 3.520 16.87  0
#> 25     23         +         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0
#> 26     23         -         AMC Javelin 15.2   8 304.0 150 3.15 3.435 17.30  0
#> 27     24         +          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0
#> 28     24         -          Camaro Z28 13.3   8 350.0 245 3.73 3.840 15.41  0
#> 29     25         +    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0
#> 30     25         -    Pontiac Firebird 19.2   8 400.0 175 3.08 3.845 17.05  0
#>    am gear carb
#> 1   0    3    1
#> 2   0    2    1
#> 3   0    3    2
#> 4   0    2    2
#> 5   0    3    1
#> 6   0    2    1
#> 7   0    3    4
#> 8   0    2    4
#> 9   0    3    3
#> 10  0    2    3
#> 11  0    3    3
#> 12  0    2    3
#> 13  0    3    3
#> 14  0    2    3
#> 15  0    3    4
#> 16  0    2    4
#> 17  0    3    4
#> 18  0    2    4
#> 19  0    3    4
#> 20  0    2    4
#> 21  0    3    1
#> 22  0    2    1
#> 23  0    3    2
#> 24  0    2    2
#> 25  0    3    2
#> 26  0    2    2
#> 27  0    3    4
#> 28  0    2    4
#> 29  0    3    2
#> 30  0    2    2

Created on 2020-04-01 by the reprex package (v0.3.0)

alexsanjoseph commented 4 years ago

Have you solved the problem by adding the row number? That is the expected behavior right?

I agree row number could be a smart default in case the dimensions for the data frames are the same. I will try to add this as a feature in an upcoming release :)

jzadra commented 4 years ago

Yes adding row number and grouping by that works. See the second chunk above.

alexsanjoseph commented 4 years ago

Cool - Adding this as a potential enhancement

alexsanjoseph commented 4 years ago

Fixed in latest release

jzadra commented 4 years ago

Thanks!