baliga-lab / cmonkey2

Python port of cMonkey, a machine-learning based method for clustering
GNU Lesser General Public License v3.0
26 stars 16 forks source link

cmonkey crashes with yeast #69

Closed kieran-mace closed 6 years ago

kieran-mace commented 7 years ago

It seems to crash because of zero edges in STRING?

$ cmonkey2 --organism sce data.tsv 2017-06-06 15:16:23 INFO checking MEME... 2017-06-06 15:16:25 INFO Input matrix has # rows: 4640, # columns: 299 2017-06-06 15:16:25 INFO # clusters/row: 2 2017-06-06 15:16:25 INFO # clusters/column: 232 2017-06-06 15:16:25 INFO # CLUSTERS: 464 2017-06-06 15:16:25 INFO use operons: 1 2017-06-06 15:16:25 INFO using MEME version 4.11.4 2017-06-06 15:16:33 INFO attempting automatic download of operons from Microbes Online 2017-06-06 15:16:33 INFO NCBI CODE IS: 559292 2017-06-06 15:16:33 INFO Automatically using STRING file in 'cache/559292.gz' (URL: http://networks.systemsbiology.net/string9/559292.gz) 2017-06-06 15:16:33 WARNING can't find the correct RSAT mapping ! 2017-06-06 15:16:33 INFO KEGG = 'Saccharomyces cerevisiae S288c' -> RSAT = 'Saccharomyces_cerevisiae' 2017-06-06 15:16:33 INFO Creating networks... 2017-06-06 15:16:33 INFO stringdb.read_edges2() 2017-06-06 15:16:33 INFO Finished loading cache/559292.gz 2017-06-06 15:16:34 INFO stringdb.read_edges2(), 0 edges read, 0 edges ignored Traceback (most recent call last): File "/usr/local/bin/cmonkey2", line 36, in cmonkey_run.run() File "/usr/local/lib/python2.7/dist-packages/cmonkey/cmonkey_run.py", line 439, in run self.prepare_run() File "/usr/local/lib/python2.7/dist-packages/cmonkey/cmonkey_run.py", line 408, in prepare_run thesaurus = self.organism().thesaurus() File "/usr/local/lib/python2.7/dist-packages/cmonkey/cmonkey_run.py", line 163, in organism self.organism = self.make_organism() File "/usr/local/lib/python2.7/dist-packages/cmonkey/cmonkey_run.py", line 273, in make_organism self['fasta_file']) File "/usr/local/lib/python2.7/dist-packages/cmonkey/organism.py", line 244, in init fasta_file) File "/usr/local/lib/python2.7/dist-packages/cmonkey/organism.py", line 117, in init OrganismBase.init(self, code, network_factories, ratios=ratios) File "/usr/local/lib/python2.7/dist-packages/cmonkey/organism.py", line 72, in init self.networks.append(make_network(self, ratios)) File "/usr/local/lib/python2.7/dist-packages/cmonkey/stringdb.py", line 135, in make_network organism, ratios) File "/usr/local/lib/python2.7/dist-packages/cmonkey/network.py", line 150, in create raise Exception("Error: only %d edges in network '%s'" % (len(network_edges), name)) Exception: Error: only 0 edges in network 'STRING'

kieran-mace commented 7 years ago

I tried this too with no luck:

cmonkey2 --organism sce --rsat_organism Saccharomyces_cervisiae --rsat_base_url http://rsat-tagc.univ-mrs.fr data.tsv

weiju commented 7 years ago

Hi, this error is typically an indication that the genes in the STRING network do not match with the primary name the RSAT feature names file maps to.

So, a STRING file (in your case cache/559292.gz) typically has the format

<gene1>TAB<gene2>TAB<weight>
...

In the cmonkey2 run's cache directory there should be

a file Saccharomyces_cerevisiae_feature_names which has the primary and alternate gene names of this organism, and in order for STRING to work these need to match up with the names in the ratios file an the primary names should match up with the ones used in the STRING file. If that is not the case, cmonkey2 will not find any matching nodes and edges for the STRING network.

Could you please check whether those names all match up ?