DeclareDesign / estimatr

estimatr: Fast Estimators for Design-Based Inference
https://declaredesign.org/r/estimatr
Other
131 stars 20 forks source link

Fix segfault issue in demeanMat #268

Closed lukesonnet closed 5 years ago

lukesonnet commented 6 years ago

This error comes up during a couple calls to demeanMat on Solaris (https://www.r-project.org/nosvn/R.check/r-patched-solaris-x86/estimatr-00check.html).

Seems to mostly be triggered by iv_robust().

graemeblair commented 6 years ago

Also triggered in DesignLibrary tests on Solaris https://cran.r-project.org/web/checks/check_results_DesignLibrary.html

lukesonnet commented 6 years ago

Thanks, has CRAN issued a deadline on this?

graemeblair commented 6 years ago

I don't think it's a problem at the moment - only when we resubmit any of these

lukesonnet commented 5 years ago

I can replicate this on rhub's solaris box. Trying to fix now.

The problem, I believe, has been isolated to this line [https://github.com/DeclareDesign/estimatr/blob/7f27e53aad9bb024918d2417f39b989bbdc42e6b/src/lm_robust_helper.cpp#L20].

However, I can't figure out what about this is incorrect according to C++11 standards, and it doesn't fail on any other system. My guess is there's something weird about unordered_map and whatever C++ compilation is going on in solaris. I've tried other ways of updating the values in the unordered_map, including using map.at() and not incrementing but using temporary arrays. None have resolved this problem.

I created a toy package to test incrementing in an unordered_map and it also seemed to pass solaris rhub tests, although I'm not using Eigen::Array2d objects. That's the next step, but only if CRAN complains about this.

nfultz commented 5 years ago

Solaris likely has a different C++ std library, compiler and compiler flags - so I would generally not trust it to do the same as gcc.

This is the block in question, that your referenced:

for (int i=0; i<fe.size(); i++) {
    std::string fei = Rcpp::as<std::string>(fe(i));
    Eigen::Array2d dat;
    dat(0) = weights(i) * x(i);
    dat(1) = weights(i);
    if (sums.find(fei) != sums.end()) {
      sums[fei] += dat;
    } else {
      sums[fei] = dat;
    }
  }

Here, dat is declared on the stack, and can be deallocated as soon as the compiler thinks it has gone out of scope. This could be as early as the end of the block for the loop (I am reading between the lines here https://www.oreilly.com/library/view/practical-c-programming/0596004192/ch09.html) - then at the next loop, sums[fei] is in the map, but points to an object that no longer exists.

But I'm speculating, haven't worked with this stuff in a while.

lukesonnet commented 5 years ago

Thank you for your assistance. I'll use this when dig back in.

nfultz commented 5 years ago

This would also be a good one to ask about on the Rcpp mailing list.

lukesonnet commented 5 years ago

I guess it depends on how it's passing dat to sums[fei]. Exploring this now.