GGiecold-zz / Cluster_Ensembles

A package for combining multiple partitions into a consolidated clustering. The combinatorial optimization problem of obtaining such a consensus clustering is reformulated in terms of approximation algorithms for graph or hyper-graph partitioning.
MIT License
69 stars 43 forks source link

Error running readme example #1

Closed MSardelich closed 8 years ago

MSardelich commented 8 years ago

Hi,

I am getting the error:

IOError: [Errno 2] No such file or directory: 'wgraph_CSPA.part.50

Before I start debugging, do you have any idea what is causing this issue?

Thanks, Marcelo

GGiecold-zz commented 8 years ago

Thank you for your interest in "Cluster_Ensembles".

I am quite puzzled by the reported error message. The example from "Cluster_Ensembles" README.md file specifies a "cluster_runs" array (i.e. a hypergraph binary adjacency matrix) of 15,000 columns. The CSPA ensemble clustering heuristic is left out of the picture beyond 10,000 columns, as it is typically less efficient and least performing than the HGPA or MCLA approximation algorithms. As such, there should be no file such as 'wgraph_CSPA.part.50'.

I have also run to completion a test with a "cluster_runs" array of size 50 times 5000. CSPA was involved but did not cause any problem.

I would be glad to be of some help but could you provide some more detail as to what might have triggered this error?

Kind regards,

Gregory

MSardelich commented 8 years ago

Thanks for your prompt reply! :-)

The error is potentially related to the routine that writes the files to disk using Pytables (HDF5). It is my educated guess based on "part" files.

As far as I understood, since the algorithm space complexity is n^2, you write the sparse similarity (hyper) matrix to disk, i.e. Indptr, data and indices numpy arrays separately.

I would test a very simple case using memory only. Thus, my first question is: Is there any way to run the code memory only? (I would isolate any Pytables issue).

My test case is very simple. Only a label permutation of three clusters. That said, the answer is expected to be the own cluster.

Below my the code:

import numpy as np
import Cluster_Ensembles as CE

c1 = np.array([0,0,0,1,1,1,2,2,2])
c2 = np.array([1,1,1,0,0,0,2,2,2])
c3 = np.array([2,2,2,0,0,0,1,1,1])

cluster_runs = np.vstack((c1,c2,c3))

consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

print(consensus_clustering_labels)

The simple code above outputs:

marcelo@zeus:~/tmp/cluster$ python tmp.py 
*****
INFO: Cluster_Ensembles: CSPA: consensus clustering using CSPA.

#
INFO: Cluster_Ensembles: wgraph: writing wgraph_CSPA.
#

#
INFO: Cluster_Ensembles: sgraph: calling gpmetis for graph partitioning.
/bin/sh: 1: gpmetis: not found
Traceback (most recent call last):
  File "tmp.py", line 10, in <module>
    consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 300, in cluster_ensembles
    cluster_ensemble.append(consensus_functions[i](hdf5_file_name, cluster_runs, verbose, N_clusters_max))
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 614, in CSPA
    return metis(hdf5_file_name, N_clusters_max)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 937, in metis
    labels = sgraph(N_clusters_max, file_name)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 1201, in sgraph
    with open(out_name, 'r') as file:
IOError: [Errno 2] No such file or directory: 'wgraph_CSPA.part.50'
marcelo@zeus:~/tmp/cluster$ 

Tricky! it writes the following files to the current folder:

marcelo@zeus:~/tmp/cluster$ ls -la
total 284
drwxrwxr-x 2 marcelo marcelo   4096 May  6 03:16 .
drwxrwxr-x 6 marcelo marcelo   4096 May  6 00:48 ..
-rw-rw-r-- 1 marcelo marcelo 274248 May  6 03:16 Cluster_Ensembles.h5
-rw-rw-r-- 1 marcelo marcelo    334 May  6 00:48 tmp.py
-rw-rw-r-- 1 marcelo marcelo    195 May  6 03:16 wgraph_CSPA
marcelo@zeus:~/tmp/cluster$ 

I see _wgraphCSPA file but not _wgraphCSPA.part.50. Here, the "I/O error".

If you run this simple test case what is the output? What files does the python code write to your current folder?

Another important point is that I didn't installed METIS software manually. Isn't it part of your python distribution? (you see the /bin/sh: 1: gpmetis: not found error?)

Cheers, M.

GGiecold-zz commented 8 years ago

Subjecting your choice of three label permutations to "Cluster_Ensembles" indeed causes an error message if the parameter "N_clusters_max" is set to 50.

consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

results in an IOError being caught and reported:

IOError: [Errno 2] No such file or directory: 'wgraph_HGPA.part.50'

On the other hand, setting "N_clusters_max" to a value in tune with the maximum number of clusters present in the "cluster_runs" array solves this issue. Namely:

consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 3)

returns just fine.

Please let me know if this helps, independently of the problem you seem to encounter with gpmetis. I couldn't reproduce the latter issue, even after removing all trace of the METIS and HMETIS packages.

Kind regards,

Gregory

On 5/5/16 10:27 PM, Marcelo Sardelich wrote:

Thanks for your prompt reply! :-)

The error is potentially related to the routine that writes the files to disk using Pytables (HDF5). It is my educated guess based on "part" files.

As far as I understood, since the algorithm space complexity is n^2, you write the sparse similarity (hyper) matrix to disk, i.e. Indptr, data and indices numpy arrays separately.

I would test a very simple case using memory only. Thus, my first question is: Is there any way to run the code memory only? (I would isolate any Pytables issue).

My test case is very simple. Only a label permutation of three clusters. That said, the answer is expected to be the own cluster.

Below my the code:

import numpy as np
import Cluster_Ensembles as CE

c1 = np.array([0,0,0,1,1,1,2,2,2])
c2 = np.array([1,1,1,0,0,0,2,2,2])
c3 = np.array([2,2,2,0,0,0,1,1,1])

cluster_runs = np.vstack((c1,c2,c3))

consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)

print(consensus_clustering_labels)

The simple code above outputs:

marcelo@zeus:~/tmp/cluster$ python tmp.py 
*****
INFO: Cluster_Ensembles: CSPA: consensus clustering using CSPA.

#
INFO: Cluster_Ensembles: wgraph: writing wgraph_CSPA.
#

#
INFO: Cluster_Ensembles: sgraph: calling gpmetis for graph partitioning.
/bin/sh: 1: gpmetis: not found
Traceback (most recent call last):
  File "tmp.py", line 10, in <module>
    consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 50)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 300, in cluster_ensembles
    cluster_ensemble.append(consensus_functions[i](hdf5_file_name, cluster_runs, verbose, N_clusters_max))
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 614, in CSPA
    return metis(hdf5_file_name, N_clusters_max)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 937, in metis
    labels = sgraph(N_clusters_max, file_name)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 1201, in sgraph
    with open(out_name, 'r') as file:
IOError: [Errno 2] No such file or directory: 'wgraph_CSPA.part.50'
marcelo@zeus:~/tmp/cluster$ 

Tricky! it writes the following files to the current folder:

marcelo@zeus:~/tmp/cluster$ ls -la
total 284
drwxrwxr-x 2 marcelo marcelo   4096 May  6 03:16 .
drwxrwxr-x 6 marcelo marcelo   4096 May  6 00:48 ..
-rw-rw-r-- 1 marcelo marcelo 274248 May  6 03:16 Cluster_Ensembles.h5
-rw-rw-r-- 1 marcelo marcelo    334 May  6 00:48 tmp.py
-rw-rw-r-- 1 marcelo marcelo    195 May  6 03:16 wgraph_CSPA
marcelo@zeus:~/tmp/cluster$ 

I see _wgraphCSPA file but not _wgraphCSPA.part.50. Here, the "I/O error".

If you run this simple test case what is the output? What files does the python code write to your current folder?

Another important point is that I didn't installed METIS software manually. Isn't it part of your python distribution? (you see the /bin/sh: 1: gpmetis: not found error?)

Cheers, M.


You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/GGiecold/Cluster_Ensembles/issues/1#issuecomment-217333255

MSardelich commented 8 years ago

@GGiecold I change the code as suggested but I still get the error.

Below the output (Now with .part.3 file):

marcelo@zeus:~/tmp/cluster$ python tmp.py 
*****
INFO: Cluster_Ensembles: CSPA: consensus clustering using CSPA.

#
INFO: Cluster_Ensembles: wgraph: writing wgraph_CSPA.
#

#
INFO: Cluster_Ensembles: sgraph: calling gpmetis for graph partitioning.
/bin/sh: 1: gpmetis: not found
Traceback (most recent call last):
  File "tmp.py", line 10, in <module>
    consensus_clustering_labels = CE.cluster_ensembles(cluster_runs, verbose = True, N_clusters_max = 3)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 300, in cluster_ensembles
    cluster_ensemble.append(consensus_functions[i](hdf5_file_name, cluster_runs, verbose, N_clusters_max))
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 614, in CSPA
    return metis(hdf5_file_name, N_clusters_max)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 937, in metis
    labels = sgraph(N_clusters_max, file_name)
  File "/home/marcelo/.local/lib/python2.7/site-packages/Cluster_Ensembles/Cluster_Ensembles.py", line 1201, in sgraph
    with open(out_name, 'r') as file:
IOError: [Errno 2] No such file or directory: 'wgraph_CSPA.part.3'
marcelo@zeus:~/tmp/cluster$ ls
Cluster_Ensembles.h5  tmp.py  tmp.py~  wgraph_CSPA
marcelo@zeus:~/tmp/cluster$ 

As I said, It is probably an issue with the Pytables.

1) Could you please confirm your Pytables version? 2) Is there any way to run the code storing the matrix in memory? (I wouls isolate the Pytables problem)

MSardelich commented 8 years ago

@GGiecold matter solved!

A simple debug and I got the issue.

The problem is that METIS software must be installed in order to run your code. I wrongly supposed that your Cluster_Ensembles repository had a METIS distribution copy and that it was expected to be compiled at package installation stage.

All in all, to solve this problem I just run sudo apt-get install metis (Ubuntu here)

In any manner, I would add it as a "before installation step" to README file.

Do you mind to change the README file or do you want me to pull a request?

Thanks for all your efforts to solve my issue. Please, feel free to close it now.

M.

GGiecold-zz commented 8 years ago

Hi Marcelo,

Glad to see that you've identified and fixed the issue.

Cluster_Ensembles is supposed to compile METIS and seemed to do so during its testing phase. Better add a "sudo ... install metis" as a prerequisite however.

Please pull a request to do so if that's not too much trouble.

All the best,

Gregory On May 6, 2016 9:07 AM, "Marcelo Sardelich" notifications@github.com wrote:

@GGiecold https://github.com/GGiecold matter solved!

A simple debug and I got the issue.

The problem is that METIS software must be installed in order to run your code http://glaros.dtc.umn.edu/gkhome/metis/metis/overview. I wrongly supposed that your Cluster_Ensembles repository had a METIS distribution attached and that it was expected to be compiled at package installation stage.

All in all, to solve this problem I just run sudo apt-get install metis (Ubuntu here)

In any manner, I would add it as a "before installation step" to README file.

Do you mind to change the README file or do you want me to pull a request?

Thanks for all your efforts to solve my issue. Please, feel free to close it now.

M.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/GGiecold/Cluster_Ensembles/issues/1#issuecomment-217434935

MSardelich commented 8 years ago

;-)