Tazinho / Advanced-R-Solutions

Set of solutions for the Advanced R programming book
https://advanced-r-solutions.rbind.io/
290 stars 118 forks source link

Chapter 20: Outputs from C++ and R versions of functions are not the same #286

Open IndrajeetPatil opened 2 years ago

IndrajeetPatil commented 2 years ago

For example, in section 20.3 Q6, the following code creates a version of union() in C++:

#include <Rcpp.h>
#include <unordered_set>
#include <algorithm>
using namespace Rcpp;

// [[Rcpp::plugins(cpp11)]]
// [[Rcpp::export]]
IntegerVector unionC(IntegerVector x, IntegerVector y) {
  int nx = x.size();
  int ny = y.size();

  IntegerVector tmp(nx + ny);

  std::sort(x.begin(), x.end()); // unique
  std::sort(y.begin(), y.end());

  IntegerVector::iterator out_end = std::set_union(
    x.begin(), x.end(), y.begin(), y.end(), tmp.begin()
  );

  int prev_value = 0;
  IntegerVector out;
  for (IntegerVector::iterator it = tmp.begin();
       it != out_end; ++it) {
    if ((it != tmp.begin())  && (prev_value == *it)) continue;

    out.push_back(*it);

    prev_value = *it;
  }

  return out;
}

But it doesn't produce the same output as its R equivalent:


# input vectors include duplicates
x <- c(1, 4, 5, 5, 5, 6, 2)
y <- c(4, 1, 6, 8)

union(x, y)
#> [1] 1 4 5 6 2 8

unionC(x, y)
#> [1] 1 2 4 5 6 8
IndrajeetPatil commented 2 years ago

Here is another example from the same section Q3:

// As a one-liner
// [[Rcpp::export]]
std::unordered_set<double> uniqueCC(NumericVector x) {
  return std::unordered_set<double>(x.begin(), x.end());
}

The outputs are different:

v1 <- c(1, 3, 3, 6, 7, 8, 9)

unique(v1)
#> [1] 1 3 6 7 8 9

uniqueCC(v1)
#> [1] 9 8 1 7 3 6
Tazinho commented 2 years ago

Do you have an example with a differing result, eg a differing set? It seems that just the order is different.

(That said, I see that it can make a huge difference in code as I rely on the order of unique (ie first appearance) quite often in real world code)

IndrajeetPatil commented 2 years ago

No, but I thought the exercises expect these results to be the same (they are indeed the same for all other Rcpp chapter exercises, except the ones with STL).

Otherwise, performance of R and C++ functions can't be compared, since they are producing different outputs.

Tazinho commented 2 years ago

I agree. Thanks for the additional thoughts. Ist makes sense to fix this some time or at least leave a note in the answer. Also edited my comment above.