Analogously to NAMED(x), an internal SEXP flag that indicates whether x has missing values or not (or it's unknown) and that can be queried as HASNA(x) with possible values:
HASNA(x) = 0: x has no missing values
HASNA(x) = 1: x has one or more missing values
HASNA(x) = 2: it is unknown whether x has missing values or not
This SEXP flag can be set by any function that have scanned x for missing values, e.g. anyNA(x), sum(x) etc.
This would allow functions to skip expensive testing for missing values whenever HASNA(x) == 0, because for real x the internal ISNAN(x) and ISNA(x) are quite expensive and slows down the processing significantly. For instance, with HASNA(x) == 0 a call to sum(x, na.rm=TRUE) can fall back to sum(x, na.rm=FALSE). Currently, it is up to the user/developer to keep track and use na.rm=FALSE.
Similarly, functions such as anyNA(x) can return (TRUE or FALSE) instantaneously - O(1) - if HASNA(x) != 2. Also, sum(x, na.rm=FALSE) and many similar functions can directly return a missing value if HASNA(x) == 1.
Luke [Tierney] is changing the SEXP header for reference counting. Thanks to the need for alignment, we will get some extra bits. We have already decided to use one of those for this purpose. Another bit will track whether a vector is sorted.
HB: That's good news. Will there be two bits for sorted to specifying increasing versus decreasing ordering?
GB: This would be an extremely cheap check (binary search for an element different than the first in the worst case, assuming NAs-at-end or NAs-at-beginning). Not sure it's worth a valuable header bit.
HB: What about character vector; will they ever be flagged as sorted? For instance, how will you know in what locale such a vector was sorted, e.g. you first sorted/collated it lexicographically using the C locale but then work in the en_US.UTF-8 locale.
ML: Good point. Will need to invalidate the flag after a locale change.
Adopted from existing Wiki entry:
Wish / Suggestion
Analogously to
NAMED(x)
, an internal SEXP flag that indicates whetherx
has missing values or not (or it's unknown) and that can be queried asHASNA(x)
with possible values:HASNA(x) = 0
:x
has no missing valuesHASNA(x) = 1
:x
has one or more missing valuesHASNA(x) = 2
: it is unknown whetherx
has missing values or notThis SEXP flag can be set by any function that have scanned
x
for missing values, e.g.anyNA(x)
,sum(x)
etc.This would allow functions to skip expensive testing for missing values whenever
HASNA(x) == 0
, because for realx
the internalISNAN(x)
andISNA(x)
are quite expensive and slows down the processing significantly. For instance, withHASNA(x) == 0
a call tosum(x, na.rm=TRUE)
can fall back tosum(x, na.rm=FALSE)
. Currently, it is up to the user/developer to keep track and use na.rm=FALSE.Similarly, functions such as
anyNA(x)
can return (TRUE
orFALSE
) instantaneously - O(1) - ifHASNA(x) != 2
. Also,sum(x, na.rm=FALSE)
and many similar functions can directly return a missing value ifHASNA(x) == 1
.Status
ML wrote (2015-11-14):