GuyAllard / markov_clustering

markov clustering in python
MIT License
167 stars 37 forks source link

ValueError: shape mismatch in assignment. #19

Open bqrkhn opened 5 years ago

bqrkhn commented 5 years ago

Just tried to run the example code given on github readme.

import markov_clustering as mc

import networkx as nx
import random

# number of nodes to use
numnodes = 200

# generate random positions as a dictionary where the key is the node id and the value
# is a tuple containing 2D coordinates
positions = {i:(random.random() * 2 - 1, random.random() * 2 - 1) for i in range(numnodes)}

# use networkx to generate the graph
network = nx.random_geometric_graph(numnodes, 0.3, pos=positions)

# then get the adjacency matrix (in sparse form)
matrix = nx.to_scipy_sparse_matrix(network)
result = mc.run_mcl(matrix)

Gives the following error:

Traceback (most recent call last):
  File "/home/baqir/code/email-sentiment-analysis/algorithms/markov_clustering_visual.py", line 22, in <module>
    result = mc.run_mcl(matrix)
  File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/markov_clustering/mcl.py", line 233, in run_mcl
    matrix = prune(matrix, pruning_threshold)
  File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/markov_clustering/mcl.py", line 93, in prune
    pruned[matrix >= threshold] = matrix[matrix >= threshold]
  File "/home/baqir/code/email-sentiment-analysis/env/lib/python3.6/site-packages/scipy/sparse/_index.py", line 109, in __setitem__
    raise ValueError("shape mismatch in assignment")
ValueError: shape mismatch in assignment

Do not understand if this is a scipy error or markov-clusterting error in passing valid arguments.

RodolfoAllendes commented 5 years ago

Getting the same error here... From inspecting the codes, it seems to be an error related to scipy... somehow, at the time of prunning, both the matrices involved (matrix and pruned) get correctly classified as scipy sparse matrices (isspmatrix returns true), but once you get to the assignment line and inside scipy, I guess the matrix on the right side (called x in the setitem method) returns false when asking if its a sparse matrix. From there on, as it gets treated as a numpy array, the mismatch happens.

That is what I understand at least.

I guess I only started having this error after updating scipy to version 1.3 (1.2 was the version available at the time of markov-clustering latest release), so downgrading to 1.2 seems to have made the trick for me (conda install scipy=1.2).

I havent tried it much yet.. but a couple of runs and I dont have the error prompting up...

Hope it helps.

achatrian commented 4 years ago

Creating a parallel dok matrix fixes the issue, but makes the algorithm slower: In mcl.py replace prune with: `def prune(matrix, threshold): """ Prune the matrix so that very small edges are removed. The maximum value in each column is never pruned.

:param matrix: The matrix to be pruned
:param threshold: The value below which edges will be removed
:returns: The pruned matrix
"""
dok_m = matrix.todok(copy=False)  # INTRODUCED BY ME TO FIX BUG WITH SCIPY>=0.13 -- DOK ALLOWS ASSIGNMENT
if isspmatrix(matrix):
    pruned = dok_matrix(matrix.shape)
    pruned[matrix >= threshold] = dok_m[dok_m >= threshold]  # DOK ALLOWS ASSIGNMENT
    pruned = pruned.tocsc()
else:
    pruned = matrix.copy()
    pruned[pruned < threshold] = 0

# keep max value in each column. same behaviour for dense/sparse
num_cols = matrix.shape[1]
row_indices = matrix.argmax(axis=0).reshape((num_cols,))  # NEED CSC OR CSR FOR ARGMAX
col_indices = np.arange(num_cols)
pruned[row_indices, col_indices] = dok_m[row_indices, col_indices]  # DOK ALLOWS ASSIGNMENT

return pruned`

Unfortunately argmax() isn't implemented for dok matrices, so both copies need to be kept.

daniel-gonzalez-cedre commented 4 years ago

For anyone still having trouble with this issue: upgrading scipy up from 1.3.x seems to have fixed it for me. It's working on 1.4.1 and 1.5.1 where it would previously fail.