alexsanjoseph / compareDF

R Tool to compare two data.frames
Other
93 stars 17 forks source link

Only retain columns with changes #22

Closed msberends closed 5 years ago

msberends commented 5 years ago

For a data set with more than ~50 variables, it's inconvenient and unnecessary to print all these columns.

A great option would be to have a parameter keep_unchanged_columns for the compare_df() function (which should default to TRUE).

Now I can achieve this when doing the comparison twice: after the first one I can analyse ctable$comparison_table_diff to see which column contains only "=". Then I gather these column names for exclusion:

my_group_col <- "some_id"
ctable <- compare_df(df_old = file_original,
                     df_new = file_corrected,
                     group_col = my_group_col,
                     limit_html = 1)

columns_to_exclude <- names(ctable$comparison_table_diff)[unlist(lapply(ctable$comparison_table_diff, function(column) if (all(column == "=")) TRUE else FALSE))]
columns_to_exclude <- columns_to_exclude[columns_to_exclude != my_group_col]
# rerun comparison, now with only changed columns
ctable <- compare_df(df_old = file_original,
                     df_new = file_corrected,
                     group_col = my_group_col,
                     exclude = columns_to_exclude)

Would be great if this could be run with:

ctable <- compare_df(df_old = file_original,
                     df_new = file_corrected,
                     group_col = "some_id",
                     keep_unchanged_columns = FALSE)
alexsanjoseph commented 5 years ago

Good ideas! I'll take a look at this soon!

alexsanjoseph commented 5 years ago

I've added this feature - If you can install from github, you should be able to test it out @msberends :)

alexsanjoseph commented 5 years ago

The feature is on CRAN now, closing this issue

msberends commented 5 years ago

You’re great!!! 👍🏼