Failure with dev tidyr - Githubissues

hadley commented 5 years ago

When I check cutpointr with the dev version of tidyr, I see:

checking examples ... ERROR

...

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> library(cutpointr)
> cutpointr(suicide, dsi, suicide, gender) %>%
+   add_metric(list(ppv, npv)) %>%
+   select(optimal_cutpoint, subgroup, AUC, sum_sens_spec, ppv, npv)
Assuming the positive class is yes
Assuming the positive class has higher x values
Error in check_roc_curve(optcut) : 
  roc_curve as returned by the method function is not an object of the class roc_cutpointr
Calls: %>% ... cutpointr_internal -> <Anonymous> -> .f -> check_roc_curve
Execution halted

checking tests ...

 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  ══ testthat results  ═════════════════════════════════════════════════════════════════
  OK: 87 SKIPPED: 0 FAILED: 40
  1. Error: Cutpointr returns a cutpointr without NAs and a certain Nr of rows (@test-cutpointr.R#3) 
  2. Error: Cutpointr works with different data types (@test-cutpointr.R#19) 
  3. Error: Bootstrap does not return duplicate colnames (@test-cutpointr.R#78) 
  4. Error: Plotting with bootstrapping is silent (@test-cutpointr.R#94) 
  5. Error: AUC calculation is correct and works with Inf and -Inf (@test-cutpointr.R#134) 
  6. Error: Correct midpoints are found (@test-cutpointr.R#149) 
  7. Error: find_metric_name finds metric (@test-cutpointr.R#160) 
  8. Error: no duplicate column names are returned (@test-cutpointr.R#182) 
  9. Error: Correct cutpoints with example data (@test-cutpointr.R#212) 
  1. ...

  Error: testthat unit tests failed
  Execution halted

checking re-building of vignette outputs ... WARNING

Error in re-building vignettes:
  ...
Quitting from lines 46-52 (cutpointr.Rmd) 
Error: processing vignette 'cutpointr.Rmd' failed with diagnostics:
roc_curve as returned by the method function is not an object of the class roc_cutpointr
Execution halted

Would you mind looking into this for me? It's possible that I've accidentally changed the API tidyr in someway but the changes are small and cutpointr is the only CRAN package that shows problems.

Thie1e commented 5 years ago

Hi, thanks for letting me know!

I could reproduce the error. It occurs because tidyr::nest_ now drops additional class values when it nests a data.frame. My roc function returns a data.frame with the additional class roc_cutpointr that got lost after nesting, thus the respective error message.

The dev version of nest still preserves additional classes when nesting a tibble, please see the example below. It doesn't seem to make a difference whether I use nest() or nest_().

I don't depend on the behavior of tidyr <= 0.8.2 (probably my roc() should return a tibble anyway), but the new behavior seems unexpected to me. Thanks again for the detailed report.

#
# With tidyr 0.8.2 from CRAN
#
library(tidyr)

# Nesting with a data.frame that has an additional class
my_object <- iris
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "data.frame" "my_class"  

# Nesting with a tibble that has an additional class
my_object <- tibble::as_tibble(iris)
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "tbl_df"     "tbl"        "data.frame" "my_class"  

#
# With tidyr 0.8.2.9000 from Github
#
library(tidyr)

# Nesting with a data.frame that has an additional class
my_object <- iris
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "tbl_df"     "tbl"        "data.frame"
# So now it's a tibble and has lost my_class

# Nesting with a tibble that has an additional class
my_object <- tibble::as_tibble(iris)
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "tbl_df"     "tbl"        "data.frame" "my_class"  
# Still a tibble and has preserved my_class

hadley commented 5 years ago

Ah, the problem is that nest() now always returns tibbles because the print method for data frames containing list columns is not very good. I'll change it to apply the transformation only when the input is a data frame, not a subclass, which should resolve this problem.

hadley commented 5 years ago

Ok, that fixed the failure. I still see a different failure, but that seems less unlikely to be related to tidyr:

checking tests ...

 ERROR
Running the tests in ‘tests/testthat.R’ failed.
Last 13 lines of output:
  `print\(scp\)` does not match "accuracy_oob 0.8201".
  Actual value: "Method: oc_youden_normal \\nPredictor: dsi \\nOutcome: suicide \\nDirection: >= \\nSubgroups: female, male \\nNr\. of bootstraps: 10 \\n\\nSubgroup: female \\n-------------------------------------------------------------------------------- \\n optimal_cutpoint accuracy    acc sensitivity specificity    AUC n_pos n_neg\\n           2\.4778   0\.8954 0\.8954      0\.8148      0\.9014 0\.9446    27   365\\n\\nCutpoint 2\.47775393352595:\\n          observation\\nprediction yes  no\\n       yes  22  36\\n       no    5 329\\n\\n\\nPredictor summary: \\n Min\. 5% 1st Qu\. Median   Mean 3rd Qu\. 95% Max\.     SD\\n    0  0       0      0 0\.8393       1   5   10 1\.7452\\n\\nPredictor summary per class: \\n    Min\.  5% 1st Qu\. Median   Mean 3rd Qu\. 95% Max     SD\\nno     0 0\.0       0      0 0\.5479       0   4  10 1\.3181\\nyes    0 1\.3       4      5 4\.7778       6   7   9 2\.0444\\n\\nBootstrap summary: \\n# A tibble: 13 x 10\\n   Variable       Min\.  `5%` `1st Qu\.` Median  Mean `3rd Qu\.` `95%`  Max\.     SD\\n   <chr>         <dbl> <dbl>     <dbl>  <dbl> <dbl>     <dbl> <dbl> <dbl>  <dbl>\\n 1 optimal_cutp… 2\.18  2\.23      2\.33   2\.43  2\.47      2\.51  2\.83  2\.94  0\.218 \\n 2 AUC_b         0\.941 0\.943     0\.950  0\.964 0\.960     0\.967 0\.974 0\.976 0\.0119\\n 3 AUC_oob       0\.894 0\.894     0\.912  0\.924 0\.925     0\.939 0\.955 0\.956 0\.0222\\n 4 accuracy_b    0\.860 0\.871     0\.888  0\.908 0\.904     0\.923 0\.927 0\.929 0\.0226\\n 5 accuracy_oob  0\.820 0\.838     0\.873  0\.876 0\.880     0\.901 0\.912 0\.914 0\.0278\\n 6 acc_b         0\.860 0\.871     0\.888  0\.908 0\.904     0\.923 0\.927 0\.929 0\.0226\\n 7 acc_oob       0\.820 0\.838     0\.873  0\.876 0\.880     0\.901 0\.912 0\.914 0\.0278\\n 8 sensitivity_b 0\.708 0\.737     0\.779  0\.823 0\.826     0\.851 0\.940 0\.954 0\.0728\\n 9 sensitivity_… 0\.625 0\.644     0\.762  0\.809 0\.800     0\.872 0\.913 0\.923 0\.0971\\n10 specificity_b 0\.870 0\.875     0\.894  0\.915 0\.909     0\.927 0\.931 0\.932 0\.0223\\n11 specificity_… 0\.835 0\.845     0\.876  0\.880 0\.886     0\.912 0\.921 0\.922 0\.0283\\n12 kappa_b       0\.321 0\.329     0\.423  0\.509 0\.485     0\.562 0\.590 0\.610 0\.0995\\n13 kappa_oob     0\.305 0\.324     0\.368  0\.420 0\.444     0\.511 0\.608 0\.631 0\.106 \\n\\nSubgroup: male \\n-------------------------------------------------------------------------------- \\n optimal_cutpoint accuracy    acc sensitivity specificity    AUC n_pos n_neg\\n           3\.1723   0\.8643 0\.8643      0\.6667      0\.8779 0\.8617     9   131\\n\\nCutpoint 3\.17225507835137:\\n          observation\\nprediction yes  no\\n       yes   6  16\\n       no    3 115\\n\\n\\nPredictor summary: \\n Min\. 5% 1st Qu\. Median Mean 3rd Qu\. 95% Max\.     SD\\n    0  0       0      0 1\.15       1   6   11 2\.1151\\n\\nPredictor summary per class: \\n    Min\.  5% 1st Qu\. Median   Mean 3rd Qu\.  95% Max     SD\\nno     0 0\.0       0      0 0\.8702       1  5\.0   6 1\.6286\\nyes    0 0\.4       3      4 5\.2222       8 10\.6  11 3\.8333\\n\\nBootstrap summary: \\n# A tibble: 13 x 10\\n   Variable       Min\.  `5%` `1st Qu\.` Median  Mean `3rd Qu\.` `95%`  Max\.     SD\\n   <chr>         <dbl> <dbl>     <dbl>  <dbl> <dbl>     <dbl> <dbl> <dbl>  <dbl>\\n 1 optimal_cutp… 2\.82  2\.84      2\.92   3\.27  3\.26      3\.55  3\.82  3\.90  0\.387 \\n 2 AUC_b         0\.758 0\.787     0\.825  0\.879 0\.871     0\.904 0\.959 0\.968 0\.0641\\n 3 AUC_oob       0\.631 0\.691     0\.792  0\.885 0\.859     0\.943 0\.972 0\.977 0\.109 \\n 4 accuracy_b    0\.807 0\.814     0\.834  0\.864 0\.852     0\.871 0\.871 0\.871 0\.0243\\n 5 accuracy_oob  0\.822 0\.823     0\.839  0\.871 0\.866     0\.896 0\.905 0\.906 0\.0327\\n 6 acc_b         0\.807 0\.814     0\.834  0\.864 0\.852     0\.871 0\.871 0\.871 0\.0243\\n 7 acc_oob       0\.822 0\.823     0\.839  0\.871 0\.866     0\.896 0\.905 0\.906 0\.0327\\n 8 sensitivity_b 0\.556 0\.582     0\.667  0\.703 0\.735     0\.794 0\.936 1     0\.129 \\n 9 sensitivity_… 0\.333 0\.363     0\.5    0\.667 0\.707     1     1     1     0\.272 \\n10 specificity_b 0\.817 0\.825     0\.846  0\.867 0\.862     0\.875 0\.892 0\.898 0\.0246\\n11 specificity_… 0\.818 0\.826     0\.853  0\.887 0\.877     0\.898 0\.917 0\.918 0\.0342\\n12 kappa_b       0\.210 0\.220     0\.243  0\.338 0\.319     0\.380 0\.407 0\.411 0\.0757\\n13 kappa_oob     0\.118 0\.145     0\.208  0\.306 0\.310     0\.398 0\.497 0\.570 0\.139 "

  [31m──[39m [31m3. Failure: summary is printed correctly (@test-cutpointr.R#1211) [39m [31m───────────────────────────────────────────[39m
  `print\(scp\)` does not match "accuracy_oob 0.8163".
  Actual value: "Method: oc_youden_normal \\nPredictor: x \\nOutcome: class \\nDirection: >= \\nSubgroups: female, male \\nNr\. of bootstraps: 10 \\n\\nSubgroup: female \\n-------------------------------------------------------------------------------- \\n optimal_cutpoint accuracy    acc sensitivity specificity    AUC n_pos n_neg\\n           2\.4778   0\.8954 0\.8954      0\.8148      0\.9014 0\.9446    27   365\\n\\nCutpoint 2\.47775393352595:\\n          observation\\nprediction yes  no\\n       yes  22  36\\n       no    5 329\\n\\n\\nPredictor summary: \\n Min\. 5% 1st Qu\. Median   Mean 3rd Qu\. 95% Max\.     SD\\n    0  0       0      0 0\.8393       1   5   10 1\.7452\\n\\nPredictor summary per class: \\n    Min\.  5% 1st Qu\. Median   Mean 3rd Qu\. 95% Max     SD\\nno     0 0\.0       0      0 0\.5479       0   4  10 1\.3181\\nyes    0 1\.3       4      5 4\.7778       6   7   9 2\.0444\\n\\nBootstrap summary: \\n# A tibble: 13 x 10\\n   Variable       Min\.  `5%` `1st Qu\.` Median  Mean `3rd Qu\.` `95%`  Max\.     SD\\n   <chr>         <dbl> <dbl>     <dbl>  <dbl> <dbl>     <dbl> <dbl> <dbl>  <dbl>\\n 1 optimal_cutp… 2\.02  2\.12      2\.32   2\.40  2\.40      2\.54  2\.62  2\.66  0\.185 \\n 2 AUC_b         0\.907 0\.910     0\.92   0\.950 0\.940     0\.958 0\.965 0\.966 0\.0227\\n 3 AUC_oob       0\.898 0\.905     0\.931  0\.953 0\.947     0\.968 0\.978 0\.980 0\.0274\\n 4 accuracy_b    0\.878 0\.878     0\.895  0\.902 0\.900     0\.908 0\.916 0\.921 0\.0138\\n 5 accuracy_oob  0\.865 0\.868     0\.879  0\.888 0\.891     0\.906 0\.914 0\.917 0\.0176\\n 6 acc_b         0\.878 0\.878     0\.895  0\.902 0\.900     0\.908 0\.916 0\.921 0\.0138\\n 7 acc_oob       0\.865 0\.868     0\.879  0\.888 0\.891     0\.906 0\.914 0\.917 0\.0176\\n 8 sensitivity_b 0\.66  0\.689     0\.759  0\.786 0\.796     0\.849 0\.896 0\.917 0\.076 \\n 9 sensitivity_… 0\.7   0\.712     0\.8    0\.847 0\.861     0\.972 1     1     0\.112 \\n10 specificity_b 0\.878 0\.881     0\.901  0\.913 0\.910     0\.922 0\.934 0\.939 0\.019 \\n11 specificity_… 0\.864 0\.867     0\.882  0\.892 0\.895     0\.909 0\.925 0\.926 0\.0216\\n12 kappa_b       0\.362 0\.410     0\.475  0\.528 0\.514     0\.566 0\.582 0\.585 0\.0692\\n13 kappa_oob     0\.160 0\.214     0\.391  0\.420 0\.404     0\.475 0\.524 0\.539 0\.112 \\n\\nSubgroup: male \\n-------------------------------------------------------------------------------- \\n optimal_cutpoint accuracy    acc sensitivity specificity    AUC n_pos n_neg\\n           3\.1723   0\.8643 0\.8643      0\.6667      0\.8779 0\.8617     9   131\\n\\nCutpoint 3\.17225507835137:\\n          observation\\nprediction yes  no\\n       yes   6  16\\n       no    3 115\\n\\n\\nPredictor summary: \\n Min\. 5% 1st Qu\. Median Mean 3rd Qu\. 95% Max\.     SD\\n    0  0       0      0 1\.15       1   6   11 2\.1151\\n\\nPredictor summary per class: \\n    Min\.  5% 1st Qu\. Median   Mean 3rd Qu\.  95% Max     SD\\nno     0 0\.0       0      0 0\.8702       1  5\.0   6 1\.6286\\nyes    0 0\.4       3      4 5\.2222       8 10\.6  11 3\.8333\\n\\nBootstrap summary: \\n# A tibble: 13 x 10\\n   Variable       Min\.  `5%` `1st Qu\.` Median  Mean `3rd Qu\.` `95%`  Max\.     SD\\n   <chr>         <dbl> <dbl>     <dbl>  <dbl> <dbl>     <dbl> <dbl> <dbl>  <dbl>\\n 1 optimal_cutp… 2\.14  2\.26      2\.93   3\.05  2\.97      3\.28  3\.35  3\.36  0\.403 \\n 2 AUC_b         0\.738 0\.760     0\.823  0\.848 0\.852     0\.904 0\.925 0\.929 0\.0611\\n 3 AUC_oob       0\.806 0\.815     0\.838  0\.901 0\.899     0\.958 0\.990 1     0\.0688\\n 4 accuracy_b    0\.8   0\.8       0\.848  0\.868 0\.854     0\.871 0\.875 0\.879 0\.0298\\n 5 accuracy_oob  0\.816 0\.820     0\.835  0\.87  0\.862     0\.877 0\.899 0\.917 0\.031 \\n 6 acc_b         0\.8   0\.8       0\.848  0\.868 0\.854     0\.871 0\.875 0\.879 0\.0298\\n 7 acc_oob       0\.816 0\.820     0\.835  0\.87  0\.862     0\.877 0\.899 0\.917 0\.031 \\n 8 sensitivity_b 0\.333 0\.376     0\.542  0\.690 0\.656     0\.744 0\.9   1     0\.192 \\n 9 sensitivity_… 0\.5   0\.545     0\.617  0\.8   0\.777     0\.95  1     1     0\.183 \\n10 specificity_b 0\.806 0\.807     0\.865  0\.876 0\.864     0\.879 0\.894 0\.903 0\.0316\\n11 specificity_… 0\.808 0\.823     0\.852  0\.874 0\.870     0\.886 0\.909 0\.909 0\.031 \\n12 kappa_b       0\.133 0\.135     0\.154  0\.264 0\.264     0\.364 0\.416 0\.436 0\.116 \\n13 kappa_oob     0\.140 0\.192     0\.318  0\.448 0\.405     0\.493 0\.575 0\.625 0\.143 "

  ══ testthat results  ════════════════════════════════════════════════════════════════════════════════════════════
  OK: 369 SKIPPED: 0 FAILED: 3
  1. Failure: summary is printed correctly (@test-cutpointr.R#1179) 
  2. Failure: summary is printed correctly (@test-cutpointr.R#1195) 
  3. Failure: summary is printed correctly (@test-cutpointr.R#1211) 

  Error: testthat unit tests failed
  Execution halted

Thie1e commented 5 years ago

I see. I could also reproduce that. This error is still related to the update of tidyr:

I have data frames with bootstrap results that I nest. The summary function simply prints them. Since these are regular data frames they get converted to tibbles now and then some of the numbers get rounded, so they don't match the expected values from the test.

So this is not really an error, but rather a design decision, I guess. If data frames get converted to tibbles now, I could for example do a print.data.frame to get the previous output without tibble's rounding (I always rounded to 4 digits) or rewrite the test.

hadley commented 5 years ago

Yeah, I'd say that's something you should fix with the test — testing printed output tends to be quite fragile, and ideally you would be testing the behaviour of your code, not tidyr's code.

Thie1e commented 5 years ago

OK, I'll probably just rewrite the test (and maybe some of the functions) and push updates to Github and CRAN very soon. Thanks for the quick fix!

hadley commented 5 years ago

Thanks! I'll probably push tidyr to CRAN today (because it's causing problems with the latest dplyr), but I'll let them know that you're working on a fix.

Thie1e commented 5 years ago

cutpointr 0.7.6 is on CRAN and now passes the CRAN checks.

Thie1e / cutpointr

Failure with dev tidyr #14