RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

Equijoin - input works, but specifying the same input in both left.input and right.input does not #178

Closed SudipSinha closed 11 years ago

SudipSinha commented 11 years ago

My input dataset is located in t1l. I want to join the table with itself, with the first column as the joining column. When I run the following code, it runs without any errors, giving the desired output.

out = equijoin(
    input=t1l,
    map.left =function(k,v) keyval(v[,1], v[,2:3, drop=FALSE]),
    map.right=function(k,v) keyval(v[,1], v[,2:3, drop=FALSE]))

My intention is not to do a self join, but to do a join across two tables. In order to proceed in that direction, I first extended this code to include both the left.input and right.input specified as t1l:

out = equijoin(
    left.input=t1l, right.input=t1l,
    map.left =function(k,v) keyval(v[,1], v[,2:3, drop=FALSE]),
    map.right=function(k,v) keyval(v[,1], v[,2:3, drop=FALSE]))

But strangely, I get the following error:


Error in split.default(x, ind, drop = FALSE) : 
  first argument must be a vector

My data set is:


   custId       x1       x2
1       1 99.90541 970.4980
2       2 96.48613 920.8165
3       3       NA 960.6413
4       4       NA 970.2129
5       5       NA 930.0822
6       6       NA 948.4217
7       7       NA 979.3058
8       8 98.96846 997.6403
9       9 94.83980 990.5345
10     10 99.77647 922.8591
piccolbo commented 11 years ago

Hi, we are trying to transition to the new repos, one per package. I will reopen your issue on https://github.com/RevolutionAnalytics/rmr2. Thanks for your understanding.