Closed hetong007 closed 9 years ago
Yes, as you found out rmr2 doesn't support sparse matrices. I am not sure why you thought it did. You could represent a sparse matrix as a data frame with cols i,j,value and write a converter from this to the class you want to use, then only use the data frame as you are using rmr2 API calls.
On Fri, Dec 26, 2014 at 11:59 PM, Tong He notifications@github.com wrote:
I am testing sparse matrix operation on rhadoop, but it seems not possible.
The following is a piece of reproducible code:
require(rhdfs) require(rmr2)tmp = rmr.options(backend='local') PageRank.mr = function(input, num.iter, dims) { V = rep(1/dims,dims) pr.map = function(., M) { keyval(1, M %*% V) } pr.reduce = function(k, Z) { vec = as.vector(Z) keyval(k, vec) } for(i in 1:num.iter) { result = mapreduce(input, map = pr.map, reduce = pr.reduce) V = values(from.dfs(result)) V = V/sum(V) } return(V) }
Testing dense matrixM = matrix(c(0,1/3,1/3,1/3,
1/2,0,1/2,0, 0,0,0,1, 1/2,1/2,0,0),4,4)Dist.M = to.dfs(M)# The result
PageRank.mr(Dist.M,25,4)# [1] 0.2647051 0.2352933 0.2058834 0.2941182
Testing sparse Matrix
require(Matrix)edgeList = cbind(c(1,1,1,2,2,3,4,4), c(2,3,4,1,3,4,1,2))spMat = spMatrix(nrow = 4, ncol = 4, i = edgeList[,2], j = edgeList[,1], x = rep(1,nrow(edgeList)))spMat = as(spMat,'dgCMatrix')colS = colSums(spMat)spMat = spMat %% Diagonal(x = 1/colS)Dist.spM = to.dfs(spMat)# Not running PageRank.mr(Dist.spM,25,4)# Error in M %% V : non-conformable arguments
This is a program calculating PageRank. It is working well with dense matrix, but the function to.dfs seems to fail in splitting the sparse matrix. I got the non-conformable arguments error because the matrix sent to each node is converted to a vector, rather than a matrix.
— Reply to this email directly or view it on GitHub https://github.com/RevolutionAnalytics/RHadoop/issues/219.
Good point! Thanks.
I am testing sparse matrix operation on rhadoop, but it seems not possible.
The following is a piece of reproducible code:
This is a program calculating PageRank. It is working well with dense matrix, but the function
to.dfs
seems to fail in splitting the sparse matrix. I got thenon-conformable arguments
error because the matrix sent to each node is converted to a vector, rather than a matrix.