alexsanjoseph / compareDF

R Tool to compare two data.frames
Other
93 stars 17 forks source link

NeedHelp #23

Closed DilipPathare closed 5 years ago

DilipPathare commented 5 years ago

Hi Alexsan,

I would like to thank you for this wonderful package. I am using this package in one of my project.

I have small concern, as of now we can only print HTML output for only 100 rows. Is there any way to expand this HTML output for entire file,

Also, when you write this output in an excel, comparison_df sheet does not give you the color coding unlike HTML for changes. Is there any way to add color coding when you write Dataframe into excel.

Thanks in Advance :) Dilip

alexsanjoseph commented 5 years ago

@DilipPathare

DilipPathare commented 5 years ago

@alexsanjoseph

you can set the limit variable to any number. The default value is 100 and so you're getting 100 rows. - Where to change the limits? in the existing function or while wiring the script in the R studio

below is my code - library(compareDF) getwd() x<-read.csv("L18Q3022.csv",header=TRUE,sep=",") y<-read.csv("L2018143.csv",header=TRUE,sep=",") df1<-as.data.frame(x) df2<-as.data.frame(y)

options(max.print=999999)

dfnew1=compare_df(df1,df2,c("name.child")) dfnew1 print(dfnew1) write.csv(dfnew1,"rajan.csv") view_html(dfnew1,max.html=99999) ??write.csv View(dfnew1) options(max.html=99999) install.packages("htmlTable") library(htmlTable) ??htmlTable print(dfnew1$html_output)

where will I make the changes in the existing code?

And surely, I would love to take a look at this

Thanks !!

alexsanjoseph commented 5 years ago

Checkout the 'limit_html' argument

On Tue 30 Jul, 2019, 20:05 DilipPathare, notifications@github.com wrote:

@alexsanjoseph https://github.com/alexsanjoseph

you can set the limit variable to any number. The default value is 100 and so you're getting 100 rows. - Where to change the limits? in the existing function or while wiring the script in the R studio

below is my code - library(compareDF) getwd() x<-read.csv("L18Q3022.csv",header=TRUE,sep=",") y<-read.csv("L2018143.csv",header=TRUE,sep=",") df1<-as.data.frame(x) df2<-as.data.frame(y)

options(max.print=999999)

dfnew1=compare_df(df1,df2,c("name.child")) dfnew1 print(dfnew1) write.csv(dfnew1,"rajan.csv") view_html(dfnew1,max.html=99999) ??write.csv View(dfnew1) options(max.html=99999) install.packages("htmlTable") library(htmlTable) ??htmlTable print(dfnew1$html_output)

where will I make the changes in the existing code?

And surely, I would love to take a look at this

Thanks !!

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alexsanjoseph/compareDF/issues/23?email_source=notifications&email_token=AAHLW5AQVWNFPSL5HR7I2MLQCBGRTA5CNFSM4IHZMXL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3EFV2A#issuecomment-516446952, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHLW5ESHMB3HJCE5VIJSB3QCBGRTANCNFSM4IHZMXLQ .

DilipPathare commented 5 years ago

I got that in the function. Just wanted to know do I need to write the entire function again changing the LIMIT_HTML=1000000?

alexsanjoseph commented 5 years ago

Did you try this out? It should work.

On Tue 30 Jul, 2019, 20:22 DilipPathare, notifications@github.com wrote:

I got that in the function. Just wanted to know do I need to write the entire function again changing the LIMIT_HTML=1000000?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alexsanjoseph/compareDF/issues/23?email_source=notifications&email_token=AAHLW5BNKG7BEHM2ON7IZ53QCBISLA5CNFSM4IHZMXL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3EHR4I#issuecomment-516454641, or mute the thread https://github.com/notifications/unsubscribe-auth/AAHLW5EAITGWVM56Q37BGT3QCBISLANCNFSM4IHZMXLQ .

DilipPathare commented 5 years ago

Hey Alex,

Tried that - but didn't work actually

rajan_df <- function(df_new, df_old, group_col, exclude = NULL, limit_html = 1000000, tolerance = 0, tolerance_type = 'ratio', stop_on_error = TRUE, keep_unchanged = FALSE, stop_on_error = TRUE, keep_unchanged_rows = FALSE, keep_unchanged_cols = TRUE, color_scheme = c("addition" = "green", "removal" = "red", "unchanged_cell" = "gray", "unchanged_row" = "deepskyblue"), html_headers = NULL, html_change_col_name = "chng_type", html_group_col_name = "grp", round_output_to = 3){

comparison_table_ts2char = comparison_table_ts2char %>% eliminate_tolerant_rows(comparison_table_diff) comparison_table_diff = eliminate_tolerant_rows(comparison_table_diff, comparison_table_diff)

if(keep_unchanged) { if(keep_unchanged_rows) { comparison_table = comparison_table %>% keep_unchanged_rows(both_tables, group_col, "val_table") comparison_table_ts2char = comparison_table_ts2char %>% keep_unchanged_rows(both_tables, group_col, "val_table") comparison_table_diff = comparison_table_diff %>% keep_unchanged_rows(both_tables, group_col, "color_table") comparison_table = comparison_table %>% keep_unchanged_rows_fn(both_tables, group_col, "val_table") comparison_table_ts2char = comparison_table_ts2char %>% keep_unchanged_rows_fn(both_tables, group_col, "val_table") comparison_table_diff = comparison_table_diff %>% keep_unchanged_rows_fn(both_tables, group_col, "color_table")

comparison_table_diff = comparison_table_diff[order(comparison_table[[group_col]]),] comparison_table_ts2char = comparison_table_ts2char[order(comparison_table[[group_col]]),] comparison_table = comparison_table[order(comparison_table[[group_col]]),] } if(!keep_unchanged_cols){ all_unchanged = apply(comparison_table_diff %>% select(-!!group_col), 2, function(x) all(x <= 0)) unchanged_cols = names(Filter(identity, all_unchanged)) comparison_table = comparison_table %>% select(-one_of(unchanged_cols)) comparison_table_ts2char = comparison_table_ts2char %>% select(-one_of(unchanged_cols)) comparison_table_diff = comparison_table_diff %>% select(-one_of(unchanged_cols)) }

if(nrow(comparison_table) == 0) stop_or_warn("The two data frames are the same after accounting for tolerance!", stop_on_error) if(nrow(comparison_table_diff) == 0) stop_or_warn("The two data frames are the same after accounting for tolerance!", stop_on_error)

}

keep_unchanged_rows <- function(comparison_table, both_tables, group_col, type){ keep_unchanged_rows_fn <- function(comparison_table, both_tables, group_col, type){ unchanged_rows = lapply(both_tables, function(x) x[!(x[[group_col]] %in% comparison_table[[group_col]]), ] ) %>% Reduce(rbind, .) %>% dplyr::mutate(chng_type = '0)

x<-read.csv("L18Q3022.csv",header=TRUE,sep=",") y<-read.csv("L2018143.csv",header=TRUE,sep=",") df1<-as.data.frame(x) df2<-as.data.frame(y)

options(max.print=999999)

dfnew1=rajan_df(df1,df2,c("name.child")) dfnew1 print(dfnew1) write.csv(dfnew1,"rajan.csv") view_html(dfnew1,max.html=99999) ??write.csv View(dfnew1) options(max.html=99999) install.packages("htmlTable") library(htmlTable) ??htmlTable print(dfnew1$html_output)

alexsanjoseph commented 5 years ago

Please don't rewreite the function, just call the funciton with limit html=100000

dfnew1=rajan_df(df1,df2,c("name.child"), limit_html = 100000)

DilipPathare commented 5 years ago

@alexsanjoseph

Thank you so much :)