Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
85 stars 13 forks source link

Failure with dev tidyr #14

Closed hadley closed 5 years ago

hadley commented 5 years ago

When I check cutpointr with the dev version of tidyr, I see:

Would you mind looking into this for me? It's possible that I've accidentally changed the API tidyr in someway but the changes are small and cutpointr is the only CRAN package that shows problems.

Thie1e commented 5 years ago

Hi, thanks for letting me know!

I could reproduce the error. It occurs because tidyr::nest_ now drops additional class values when it nests a data.frame. My roc function returns a data.frame with the additional class roc_cutpointr that got lost after nesting, thus the respective error message.

The dev version of nest still preserves additional classes when nesting a tibble, please see the example below. It doesn't seem to make a difference whether I use nest() or nest_().

I don't depend on the behavior of tidyr <= 0.8.2 (probably my roc() should return a tibble anyway), but the new behavior seems unexpected to me. Thanks again for the detailed report.

#
# With tidyr 0.8.2 from CRAN
#
library(tidyr)

# Nesting with a data.frame that has an additional class
my_object <- iris
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "data.frame" "my_class"  

# Nesting with a tibble that has an additional class
my_object <- tibble::as_tibble(iris)
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "tbl_df"     "tbl"        "data.frame" "my_class"  

#
# With tidyr 0.8.2.9000 from Github
#
library(tidyr)

# Nesting with a data.frame that has an additional class
my_object <- iris
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "tbl_df"     "tbl"        "data.frame"
# So now it's a tibble and has lost my_class

# Nesting with a tibble that has an additional class
my_object <- tibble::as_tibble(iris)
class(my_object) <- c(class(my_object), "my_class")
nested_object <- nest_(my_object, key_col = "foo")
purrr::map(nested_object$foo, class)
# [1] "tbl_df"     "tbl"        "data.frame" "my_class"  
# Still a tibble and has preserved my_class
hadley commented 5 years ago

Ah, the problem is that nest() now always returns tibbles because the print method for data frames containing list columns is not very good. I'll change it to apply the transformation only when the input is a data frame, not a subclass, which should resolve this problem.

hadley commented 5 years ago

Ok, that fixed the failure. I still see a different failure, but that seems less unlikely to be related to tidyr:

Thie1e commented 5 years ago

I see. I could also reproduce that. This error is still related to the update of tidyr:

I have data frames with bootstrap results that I nest. The summary function simply prints them. Since these are regular data frames they get converted to tibbles now and then some of the numbers get rounded, so they don't match the expected values from the test.

So this is not really an error, but rather a design decision, I guess. If data frames get converted to tibbles now, I could for example do a print.data.frame to get the previous output without tibble's rounding (I always rounded to 4 digits) or rewrite the test.

hadley commented 5 years ago

Yeah, I'd say that's something you should fix with the test — testing printed output tends to be quite fragile, and ideally you would be testing the behaviour of your code, not tidyr's code.

Thie1e commented 5 years ago

OK, I'll probably just rewrite the test (and maybe some of the functions) and push updates to Github and CRAN very soon. Thanks for the quick fix!

hadley commented 5 years ago

Thanks! I'll probably push tidyr to CRAN today (because it's causing problems with the latest dplyr), but I'll let them know that you're working on a fix.

Thie1e commented 5 years ago

cutpointr 0.7.6 is on CRAN and now passes the CRAN checks.