HenrikBengtsson / Wishlist-for-R

Features and tweaks to R that I and others would love to see - feel free to add yours!
https://github.com/HenrikBengtsson/Wishlist-for-R/issues
GNU Lesser General Public License v3.0
133 stars 4 forks source link

HASNA(x): SEXP flag indicating whether `x` has missing values or not (or unknown) #12

Open HenrikBengtsson opened 8 years ago

HenrikBengtsson commented 8 years ago

Adopted from existing Wiki entry:

Wish / Suggestion

Analogously to NAMED(x), an internal SEXP flag that indicates whether x has missing values or not (or it's unknown) and that can be queried as HASNA(x) with possible values:

This SEXP flag can be set by any function that have scanned x for missing values, e.g. anyNA(x), sum(x) etc.

This would allow functions to skip expensive testing for missing values whenever HASNA(x) == 0, because for real x the internal ISNAN(x) and ISNA(x) are quite expensive and slows down the processing significantly. For instance, with HASNA(x) == 0 a call to sum(x, na.rm=TRUE) can fall back to sum(x, na.rm=FALSE). Currently, it is up to the user/developer to keep track and use na.rm=FALSE.

Similarly, functions such as anyNA(x) can return (TRUE or FALSE) instantaneously - O(1) - if HASNA(x) != 2. Also, sum(x, na.rm=FALSE) and many similar functions can directly return a missing value if HASNA(x) == 1.

Status

ML wrote (2015-11-14):

Luke [Tierney] is changing the SEXP header for reference counting. Thanks to the need for alignment, we will get some extra bits. We have already decided to use one of those for this purpose. Another bit will track whether a vector is sorted.

  • HB: That's good news. Will there be two bits for sorted to specifying increasing versus decreasing ordering?
  • GB: This would be an extremely cheap check (binary search for an element different than the first in the worst case, assuming NAs-at-end or NAs-at-beginning). Not sure it's worth a valuable header bit.
  • HB: What about character vector; will they ever be flagged as sorted? For instance, how will you know in what locale such a vector was sorted, e.g. you first sorted/collated it lexicographically using the C locale but then work in the en_US.UTF-8 locale.
  • ML: Good point. Will need to invalidate the flag after a locale change.
HenrikBengtsson commented 8 years ago

@lawremi, do you have any updates on this from Luke?

lawremi commented 8 years ago

Luke has not yet made any changes to the SEXP header. Will let you know if there are any updates. I'm very excited about the potential.