Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
84 stars 13 forks source link

Prepare for tidyr v1.0.0 #22

Closed jennybc closed 5 years ago

jennybc commented 5 years ago

tidyr is preparing for the release of what will become v1.0.0.

Here's what we see for cutpointr in our initial revdep check:

https://github.com/tidyverse/tidyr/blob/master/revdep/problems.md#cutpointr

I took a quick look at your dev version and it seems the tidyr::nest_() calls are gone, which is good. I'm seeing other errors ("Tibble columns must have consistent lengths"), though, which keep from verifying if all is well with respect to tidyr.

It looks like you are perhaps building towards a release anyway. So I suggest that, while you are at it, you also make sure you're testing against the devel version of tidyr. Let me know if you need any pointers.

Here is advice on how to test your package on travis against the devel version of tidyr during this period:

https://github.com/tidyverse/tidyr/issues/692#issuecomment-518443247

Thie1e commented 5 years ago

Hi, thanks for letting me know and for actually checking the code. I was indeed about to release the package as it is on Github now (with all the deprecated SE-functions removed) to CRAN very soon.

Among a few errors / warnings that seem to be easy to fix I'm now also getting the x and by must have same size error which rather surprisingly seems to be caused by an additional class of my tibble (I'm nesting a tibble with ROC curve data that has the additional class roc_cutpointr). Here's an example:

library(tibble)
library(tidyr)
mydat <- as_tibble(iris)
class(mydat) <- c(class(mydat), "myclass")
nest_mydat <- nest(mydat)
# Error: `x` and `by` must have same size

In the previous version tidyr would nest that tibble without any problems and I could also do class(nest_mydat$data[[1]]) and get myclass back:

[1] "tbl_df"     "tbl"        "data.frame" "myclass"

So, since the error message doesn't seem to reflect what's going on, I'm not sure if this new behaviour is intended. I'd be happy if you could give me some information on whether this is going to stay. Thanks again.

jennybc commented 5 years ago

Just from reading the above, not running any code, I would say that this call is incomplete: nest(mydat). Is the above example perhaps too simple? This would be best to discuss with an example where there are variables that stay outside the new nested list-column and variables that go inside.

To skip ahead, the new syntax will look something like nest(mydat, data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)), i.e. tidyr is getting more explicit about the name of the new list-col and the variables that go in it.

jennybc commented 5 years ago

It looks like there will be an official email or GitHub issue blast tomorrow (or soon, in any case), with comprehensive migration advice and a date / deadline for tidyr's release.

Thie1e commented 5 years ago

Thanks, I have received the email.

To clarify, I just wanted to nest the whole data.frame. With tidyr 0.8.99.9000 this fails, if my data.frame has an additional class:

library(tibble)
library(tidyr)
mydat <- as_tibble(iris)
class(mydat) <- c(class(mydat), "myclass")
nest_mydat <- nest(mydat)
#> `x` and `by` must have same size

It worked with tidyr 0.8.3:

library(tibble)
library(tidyr)
mydat <- as_tibble(iris)
class(mydat) <- c(class(mydat), "myclass")
nest_mydat <- nest(mydat)
nest_mydat
#> # A tibble: 1 x 1
#>   data              
#>   <list>            
#> 1 <tibble [150 x 5]>
class(nest_mydat$data[[1]])
#> [1] "tbl_df"     "tbl"        "data.frame" "myclass"

It just seemed weird that this fails now, if I add a class value to mydat. If necessary, I guess I could do the same simply using list (tidyr 0.8.99.9000). Then the column just doesn't have the new <list<df[,5]>> description:

mydat <- as_tibble(iris)
class(mydat) <- c(class(mydat), "myclass")
result <- tibble(a = 42, b = 43)
result$data <- list(mydat)
result
#> # A tibble: 1 x 3
#>       a     b data              
#>   <dbl> <dbl> <list>            
#> 1    42    43 <tibble [150 x 5]>
jennybc commented 5 years ago

Interesting. Some of this suggests tidyr probably should be behaving, messaging, and warning differently. Thanks for the examples! I'll get back to you.

I do have two observations already:

library(tibble)
library(tidyr)

packageVersion("tidyr")
#> [1] '0.8.99.9000'

mydat <- as_tibble(iris)
# I think this should probably give an error or warning ...
# thanks for bringing it to my attention!
nest(mydat)
#> # A tibble: 1 x 1
#>             data
#>   <list<df[,5]>>
#> 1      [150 × 5]

# you should really add your class to the *front*
class(mydat) <- c("myclass", class(mydat))
# then you at least get same result before and after adding your class
nest(mydat)
#> # A tibble: 1 x 1
#>             data
#>   <list<df[,5]>>
#> 1      [150 × 5]

mydat <- as_tibble(iris)
class(mydat) <- c(class(mydat), "myclass")
# this is a weird error to get -- not very informative
# thanks for bringing it to my attention!
nest(mydat)
#> `x` and `by` must have same size

Created on 2019-08-13 by the reprex package (v0.3.0.9000)

jennybc commented 5 years ago

We'll probably add a new warning, but my two points above are really the "answer" and should guide your modifications.

So go ahead and add your class to the front, following convention, unless there's a reason not to. Why this matters is still very puzzling to me 🤔, but we don't have to solve that mystery for you to move forward.

And you definitely want to switch the nest() call to something like nest(df, data = everything()).