RevolutionAnalytics / RHadoop

RHadoop
https://github.com/RevolutionAnalytics/RHadoop/wiki
763 stars 278 forks source link

Is it possible to distributedly send a sparse Matrix? #219

Closed hetong007 closed 9 years ago

hetong007 commented 9 years ago

I am testing sparse matrix operation on rhadoop, but it seems not possible.

The following is a piece of reproducible code:

require(rhdfs)
require(rmr2)
tmp = rmr.options(backend='local')

PageRank.mr = function(input, num.iter, dims) {
    V = rep(1/dims,dims)
    pr.map = function(., M) {
        keyval(1, M %*% V)
    }
    pr.reduce = function(k, Z) {
        vec = as.vector(Z)
        keyval(k, vec)
    }
    for(i in 1:num.iter) {
        result = mapreduce(input, map = pr.map, reduce = pr.reduce)
        V = values(from.dfs(result))
        V = V/sum(V)
    }
    return(V)
}

# Testing dense matrix
M = matrix(c(0,1/3,1/3,1/3,
             1/2,0,1/2,0,
             0,0,0,1,
             1/2,1/2,0,0),4,4)
Dist.M = to.dfs(M)
# The result
PageRank.mr(Dist.M,25,4)
# [1] 0.2647051 0.2352933 0.2058834 0.2941182

# Testing sparse Matrix
require(Matrix)
edgeList = cbind(c(1,1,1,2,2,3,4,4),
                 c(2,3,4,1,3,4,1,2))
spMat = spMatrix(nrow = 4, ncol = 4,
                 i = edgeList[,2], j = edgeList[,1], x = rep(1,nrow(edgeList)))
spMat = as(spMat,'dgCMatrix')
colS = colSums(spMat)
spMat = spMat %*% Diagonal(x = 1/colS)
Dist.spM = to.dfs(spMat)
# Not running
PageRank.mr(Dist.spM,25,4)
# Error in M %*% V : non-conformable arguments

This is a program calculating PageRank. It is working well with dense matrix, but the function to.dfs seems to fail in splitting the sparse matrix. I got the non-conformable arguments error because the matrix sent to each node is converted to a vector, rather than a matrix.

piccolbo commented 9 years ago

Yes, as you found out rmr2 doesn't support sparse matrices. I am not sure why you thought it did. You could represent a sparse matrix as a data frame with cols i,j,value and write a converter from this to the class you want to use, then only use the data frame as you are using rmr2 API calls.

On Fri, Dec 26, 2014 at 11:59 PM, Tong He notifications@github.com wrote:

I am testing sparse matrix operation on rhadoop, but it seems not possible.

The following is a piece of reproducible code:

require(rhdfs) require(rmr2)tmp = rmr.options(backend='local') PageRank.mr = function(input, num.iter, dims) { V = rep(1/dims,dims) pr.map = function(., M) { keyval(1, M %*% V) } pr.reduce = function(k, Z) { vec = as.vector(Z) keyval(k, vec) } for(i in 1:num.iter) { result = mapreduce(input, map = pr.map, reduce = pr.reduce) V = values(from.dfs(result)) V = V/sum(V) } return(V) }

Testing dense matrixM = matrix(c(0,1/3,1/3,1/3,

         1/2,0,1/2,0,
         0,0,0,1,
         1/2,1/2,0,0),4,4)Dist.M = to.dfs(M)# The result

PageRank.mr(Dist.M,25,4)# [1] 0.2647051 0.2352933 0.2058834 0.2941182

Testing sparse Matrix

require(Matrix)edgeList = cbind(c(1,1,1,2,2,3,4,4), c(2,3,4,1,3,4,1,2))spMat = spMatrix(nrow = 4, ncol = 4, i = edgeList[,2], j = edgeList[,1], x = rep(1,nrow(edgeList)))spMat = as(spMat,'dgCMatrix')colS = colSums(spMat)spMat = spMat %% Diagonal(x = 1/colS)Dist.spM = to.dfs(spMat)# Not running PageRank.mr(Dist.spM,25,4)# Error in M %% V : non-conformable arguments

This is a program calculating PageRank. It is working well with dense matrix, but the function to.dfs seems to fail in splitting the sparse matrix. I got the non-conformable arguments error because the matrix sent to each node is converted to a vector, rather than a matrix.

— Reply to this email directly or view it on GitHub https://github.com/RevolutionAnalytics/RHadoop/issues/219.

hetong007 commented 9 years ago

Good point! Thanks.