alexsanjoseph / compareDF

R Tool to compare two data.frames
Other
93 stars 17 forks source link

compare_df - incorrect number of dimension error #48

Closed UTexas80 closed 2 years ago

UTexas80 commented 2 years ago

Hello,

I am running the example code

Reproducible Code

ctable_student = compare_df(results_2011, results_2010, c("Student"))

Describe the bug I receive an error message stating "Error in both_tables$df_new[, .SD, .SDcols = names(both_tables$df_old)] ": incorrect number of dimensions

Expected behavior I ran the code several weeks ago and it was working correctly. I was able to output an excel worksheet comparing the differences between datasets.

Desktop (please complete the following information):

alexsanjoseph commented 2 years ago

I'm unable to reproduce the problem on my local. Any idea what changed recently at your end?

UTexas80 commented 2 years ago

Thank you for the prompt response. Much appreciated. I love the package and find it invaluable. I added the 'tidytables' library and thought that might have caused an issue. I removed it but still ran into the same errors. Interestingly, I am comparing two (2) data.tables and get a "error in setkeyv(value, key) : some columns are not in the data.table: STATION_ID error. Does the group column need to be formatted as character? I have the CRAN compareDF version [2.3.3] installed. Thank you for your time and consideration.

alexsanjoseph commented 2 years ago

The error seems to suggest that the STATION_ID column is not missing in one of the two tables. Are you sure there is no typo in the names?

Does the group column need to be formatted as character

It shouldn't have to, but if that's the case - It's a bug that has to be fixed :)

UTexas80 commented 2 years ago

Using your results datasets

results_2010 <- data.frame( Maths = c(90L, 85L, 93L, 95L, 99L, 99L), Physics = c(84L, 92L, 93L, 92L, 92L, 81L), Chem = c(91L, 91L, 92L, 71L, 82L, 91L), Art = c(34L, 36L, 21L, 37L, 78L, 24L), Division = as.factor(c("A", "A", "A", "A", "A", "A")), Student = as.factor(c("Isaac","Akshay", "Vishwas","Rohit","Venu","Ananth")), Discipline = as.factor(c("B", "B", "A", "C", "A", "B")), PE = as.factor(c("B", "B", "B", "B", "E", "A")) ) results_2011 <- data.frame( Maths = c(90L, 85L, 82L, 94L, 100L, 78L), Physics = c(84L, 92L, 93L, 92L, 92L, 81L), Chem = c(91L, 91L, 92L, 71L, 82L, 91L), Art = c(34L, 36L, 21L, 37L, 78L, 24L), Division = as.factor(c("A", "A", "A", "A", "A", "A")), Student = as.factor(c("Isaac","Akshay", "Vishwas","Rohit","Venu","Ananth")), Discipline = as.factor(c("A", "A", "B", "D", "A", "B")), PE = as.factor(c("B", "B", "B", "B", "E", "A")) )

sessioninfo::session_info()

> - Session info ---------------------------------------------------------------

> setting value

> version R version 4.1.2 (2021-11-01)

> os Windows 10 x64 (build 19044)

> system x86_64, mingw32

> ui RTerm

> language (EN)

> collate English_United States.1252

> ctype English_United States.1252

> tz America/New_York

> date 2022-09-21

> pandoc 2.18 @ C:/Program Files/RStudio/bin/quarto/bin/tools/ (via rmarkdown)

>

> - Packages -------------------------------------------------------------------

> ! package * version date (UTC) lib source

> cli 3.4.0.9000 2022-09-16 [1] Github (r-lib/cli@c39f795)

> P digest 0.6.29 2021-12-01 [?] CRAN (R 4.1.2)

> P evaluate 0.16 2022-08-09 [?] CRAN (R 4.1.3)

> P fansi 1.0.3 2022-03-24 [?] CRAN (R 4.1.3)

> P fastmap 1.1.0 2021-01-25 [?] CRAN (R 4.1.2)

> P fs 1.5.2 2021-12-08 [?] CRAN (R 4.1.2)

> P glue 1.6.2 2022-02-24 [?] CRAN (R 4.1.3)

> P highr 0.9 2021-04-16 [?] CRAN (R 4.1.2)

> P htmltools 0.5.3 2022-07-18 [?] CRAN (R 4.1.3)

> P knitr 1.40 2022-08-24 [?] CRAN (R 4.1.3)

> lifecycle 1.0.2.9000 2022-09-16 [1] Github (r-lib/lifecycle@a2666fc)

> P magrittr 2.0.3 2022-03-30 [?] CRAN (R 4.1.3)

> P pillar 1.8.1 2022-08-19 [?] CRAN (R 4.1.3)

> P pkgconfig 2.0.3 2019-09-22 [?] CRAN (R 4.1.2)

> P purrr 0.3.4 2020-04-17 [?] CRAN (R 4.1.2)

> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.1.3)

> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.1.3)

> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.1.3)

> R.utils 2.12.0-9000 2022-09-16 [1] Github (HenrikBengtsson/R.utils@84c6c69)

> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.1.3)

> P rlang 1.0.5 2022-08-31 [?] CRAN (R 4.1.3)

> P rmarkdown 2.16 2022-08-24 [?] CRAN (R 4.1.3)

> rstudioapi 0.14 2022-08-22 [1] CRAN (R 4.1.3)

> P sessioninfo 1.2.2 2021-12-06 [?] CRAN (R 4.1.3)

> stringi 1.7.8 2022-07-11 [1] CRAN (R 4.1.2)

> P stringr 1.4.1 2022-08-20 [?] CRAN (R 4.1.3)

> styler 1.7.0 2022-03-13 [1] CRAN (R 4.1.3)

> P tibble 3.1.8 2022-07-22 [?] CRAN (R 4.1.3)

> P utf8 1.2.2 2021-07-24 [?] CRAN (R 4.1.2)

> P vctrs 0.4.1 2022-04-13 [?] CRAN (R 4.1.3)

> P withr 2.5.0 2022-03-03 [?] CRAN (R 4.1.3)

> P xfun 0.33 2022-09-12 [?] CRAN (R 4.1.2)

> P yaml 2.3.5 2022-02-21 [?] CRAN (R 4.1.2)

>

>

> P -- Loaded and on-disk path mismatch.

>

> ------------------------------------------------------------------------------

compareDF::compare_df(results_2010,results_2011,"Student")

> ------------------------------------------------------------------------------

I get a "Error in both_tables$df_new[, .SD, .SDcols = names(both_tables$df_old)] : incorrect number of dimensions"

I am at a loss...

alexsanjoseph commented 2 years ago

Can you create a full reproducible example from your data that I can use to test?

alexsanjoseph commented 2 years ago

Closing due to lack of activity