RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

rhbase segfaults if hb.list.tables() is called before hb.init() #152

Open jbarber opened 12 years ago

jbarber commented 12 years ago

(warning: I've only just started using hadoop + RHadoop, it's entirely possible I'm doing something wrong)

I get a segfault in rhbase (from the rmr-2.0.0 tag) if I don't call hb.init() before hb.list.tables():

> library(rhbase)
> hb.list.tables()

 *** caught segfault ***
address 0x10000b8, cause 'memory not mapped'

Traceback:
 1: .Call("hb_get_tables", hbc, PACKAGE = "rhbase")
 2: hb.list.tables()

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

gdb + the core file show the problem being at line 61 of tools.cc:

(gdb) bt
#0  0x00007fca4005720c in hb_get_tables (r=<value optimized out>) at tools.cc:61
#1  0x0000003ed12a921c in ?? () from /usr/lib64/R/lib/libR.so
#2  0x0000003ed12dd6fb in Rf_eval () from /usr/lib64/R/lib/libR.so
#3  0x0000003ed12e44b0 in ?? () from /usr/lib64/R/lib/libR.so
#4  0x0000003ed12dd51b in Rf_eval () from /usr/lib64/R/lib/libR.so
#5  0x0000003ed12def50 in ?? () from /usr/lib64/R/lib/libR.so
#6  0x0000003ed12dd51b in Rf_eval () from /usr/lib64/R/lib/libR.so
#7  0x0000003ed12df831 in Rf_applyClosure () from /usr/lib64/R/lib/libR.so
#8  0x0000003ed12dd3f8 in Rf_eval () from /usr/lib64/R/lib/libR.so
#9  0x0000003ed1314a98 in Rf_ReplIteration () from /usr/lib64/R/lib/libR.so
#10 0x0000003ed1314d29 in ?? () from /usr/lib64/R/lib/libR.so
#11 0x0000003ed1315260 in run_Rmainloop () from /usr/lib64/R/lib/libR.so
#12 0x000000000040084b in main ()
(gdb) f 0
#0  0x00007fca4005720c in hb_get_tables (r=<value optimized out>) at tools.cc:61
61        client->getTableNames(tables);
(gdb) list
56    SEXP hb_get_tables(SEXP r){
57      HbaseClient *client  = static_cast<HbaseClient*>(R_ExternalPtrAddr(r));
58      std::vector<std::string> tables;
59      SEXP result = R_NilValue;
60      try{
61        client->getTableNames(tables);
62        if(tables.size()>0){
63      PROTECT(result = Rf_allocVector(STRSXP,tables.size()));
64      for(unsigned int i=0;i < tables.size(); i++){
65        SET_STRING_ELT(result,i,Rf_mkChar(static_cast<const char*>(tables[i].c_str())));
(gdb) p tables
$4 = std::vector of length 0, capacity 0
(gdb) p client
$5 = (apache::hadoop::hbase::thrift::HbaseClient *) 0x10d5238

I guess either "tables" or "client" aren't initialized properly, but my C++ and R skills are too weak to diagnose it further.

RevolutionAnalytics commented 12 years ago

You have to call hb.init() before invoking any other function in the rhbase package. Please look to the examples in the documentation, and unit tests in the package

jbarber commented 12 years ago

Thank you for pointing out the documentation. However, on reviewing it I don't think it says that: 1) the call to hb.init() is required 2) your entire R environment will blow up if you don't call hb.init()

In addition, I wouldn't normally expect an R function call to crash the entire environment.

Can I suggest improving the user friendliness of the package by detecting that hb.init() hadn't been called and then some combination of: 1) emitting a warning 2) calling hb.init() with the defaults

and not causing a segfault.

Regards