Closed bobular closed 2 years ago
I see that requiredDigits
is also passed to cut()
as the lab.dig
argument.
This isn't very well documented, but I have tested it with -1 and it seems to behave as desired:
> x <- seq(0,1000000,10000) + 1
> cut(x, c(0,250000,500000,750000,1000000), dig.lab=-1)
Levels: (0,250000] (250000,500000] (500000,750000] (750000,1e+06]
> cut(x, c(0,250000,500000,750000,1000000), dig.lab=0)
Levels: (0,2e+05] (2e+05,5e+05] (5e+05,8e+05] (8e+05,1e+06]
Fixes #179
It turns out that all-integer variables were causing the formatC function to be called with a request for
0
significant digits. This seemed to default to 1 significant digit, which caused non-unique bins (e.g. 0, 500, 1000, 1000, 2000, 2000). The solution seems to be to use a negative value (e.g. -1).For non-integer variables, although there was no bug here, I thought the ceiling(median(number of digits)) was a more conservative/stable approach than floor(mean(number of digits)), but open to alternatives! I realise that median is more computationally expensive than mean (or I assume so, anyway).
I haven't added a test... (because I don't have time today to figure out the testing container) And I'm not sure if this will break any existing tests - do the github actions run the tests?
It is tested in the EDA though. Seems to solve the issue :+1: