Thie1e / cutpointr

Optimal cutpoints in R: determining and validating optimal cutpoints in binary classification
https://cran.r-project.org/package=cutpointr
84 stars 13 forks source link

`use_midpoints = TRUE` per default? #3

Open Thie1e opened 6 years ago

Thie1e commented 6 years ago

The current default is use_midpoints = FALSE. However, as the Readme and the vignette illustrate, this leads to a certain bias when methods other than the normal or kernel method are applied. On the other hand, many users might prefer the current setting, because the returned cutpoints can be found in the data. Additionally, changing the default might surprise current users. Should it be changed or not?

AngelCampos commented 3 years ago

I would vote for setting it to TRUE.

The earlier the adoption, the fewer users will be surprised to see the change. But more users will have the benefit of using the default value of use_midpoints = TRUE as most, myself included, forget setting it. I personally had to figure out I didn't set it to TRUE after some testing results showed poor performance.

Edit: Didn't see the date of posting, maybe it is too late for changing it now though. :(

Thie1e commented 3 years ago

That's OK, thanks for your comment! I'd say that changing defaults is always a possibility.

An additional advantage of use_midpoints = TRUE that I should add to the list is that this way all columns are type stable, otherwise some columns become list columns if multiple optimal cutpoints are found. All built-in functions can deal with that correctly, but some users were suprised by the type-instability.