In statistics, constructions like this are quite common:
if (length(unique(x)) > 27) {
Some binning
}
If x is long and continuous, calling unique() seems inefficient (even if it uses a hash logic).
It would therefore be fantastic to have a function nunique(x, nmax=length(x)). It would safely return nmax if the number of distinct values is at least that large.
Example: If x is continuous with 1e10 disjoint values, nunique(x, 27) would return 27, and the operation would only have complexity O(27).
Note: unique() has an argument nmax, but it seems to be for memory allocation of the hash table, and probably not a safe way to achieve such task.
In statistics, constructions like this are quite common:
If
x
is long and continuous, callingunique()
seems inefficient (even if it uses a hash logic).It would therefore be fantastic to have a function
nunique(x, nmax=length(x))
. It would safely returnnmax
if the number of distinct values is at least that large.Example: If
x
is continuous with 1e10 disjoint values,nunique(x, 27)
would return 27, and the operation would only have complexity O(27).Note:
unique()
has an argumentnmax
, but it seems to be for memory allocation of the hash table, and probably not a safe way to achieve such task.