baliga-lab / cmonkey2

Python port of cMonkey, a machine-learning based method for clustering
GNU Lesser General Public License v3.0
26 stars 16 forks source link

error when using example data #73

Closed wwick closed 7 years ago

wwick commented 7 years ago

I tried running cmonkey2 with the example data at https://github.com/baliga-lab/cmonkey2/blob/master/example_data/hal/halo_ratios5.tsv. I am not sure why I am receiving an error. The terminal output is:

$ cmonkey2 --organism hal halo_ratios5.tsv 2017-09-28 09:03:06 INFO checking MEME... 2017-09-28 09:03:06 INFO Input matrix has # rows: 428, # columns: 5 2017-09-28 09:03:06 INFO # clusters/row: 2 2017-09-28 09:03:06 INFO # clusters/column: 29 2017-09-28 09:03:06 INFO # CLUSTERS: 43 2017-09-28 09:03:06 INFO use operons: 1 2017-09-28 09:03:06 INFO using MEME version 4.10.2 2017-09-28 09:03:06 INFO attempting automatic download of operons from Microbes Online 2017-09-28 09:03:06 INFO NCBI CODE IS: 64091 2017-09-28 09:03:06 INFO Automatically using STRING file in 'cache/64091.gz' (URL: http://networks.systemsbiology.net/string9/64091.gz) 2017-09-28 09:03:10 WARNING can't find the correct RSAT mapping ! 2017-09-28 09:03:10 INFO KEGG = 'Halobacterium NRC 1 uid57769' -> RSAT = 'Dictyostelium_discoideum' 2017-09-28 09:03:10 INFO Creating networks... 2017-09-28 09:03:10 INFO stringdb.read_edges2() 2017-09-28 09:03:11 INFO Finished loading cache/64091.gz 2017-09-28 09:03:19 WARNING 2589 (out of 471056) nodes not found in synonyms 2017-09-28 09:03:19 INFO stringdb.read_edges2(), 0 edges read, 235528 edges ignored Traceback (most recent call last): File "/Users/kjsdhjasv/anaconda/envs/bioinfo/bin/cmonkey2", line 36, in <module> cmonkey_run.run() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 439, in run self.prepare_run() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 408, in prepare_run thesaurus = self.organism().thesaurus() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 163, in organism self.__organism = self.make_organism() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 273, in make_organism self['fasta_file']) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/organism.py", line 244, in __init__ fasta_file) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/organism.py", line 117, in __init__ OrganismBase.__init__(self, code, network_factories, ratios=ratios) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/organism.py", line 72, in __init__ self.__networks.append(make_network(self, ratios)) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/stringdb.py", line 135, in make_network organism, ratios) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/network.py", line 150, in create raise Exception("Error: only %d edges in network '%s'" % (len(network_edges), name)) Exception: Error: only 0 edges in network 'STRING' (bioinfo)

weiju commented 7 years ago

Hi, thanks for bringing this issue to our attention. cmonkey (all versions) is traditionally relying on the stability of the RSAT repositories. Unfortunately, the structure has changed and not only have mirrors disappeared and moved, they also have been split into groups.

To compensate for that somewhat we have the switch --rsat_base_url so that users can search different mirrors for the data.

So in order to be able to run our example data, you well need to specify a mirror that has this information, in your case:

cmonkey2 --organism hal --rsat_base_url http://networks.systemsbiology.net/rsat halo_ratios5.tsv

Please let me know if the issue still persists. I will update the documentation to address this.

wwick commented 7 years ago

I ran it just as specified and seem to still be getting the error can't find the correct RSAT mapping !. The terminal output seems to be the same, but I'll paste it here just in case: $ cmonkey2 --organism hal --rsat_base_url http://networks.systemsbiology.net/rsat halo_ratios5.tsv 2017-10-01 16:55:56 INFO checking MEME... 2017-10-01 16:55:57 INFO Input matrix has # rows: 428, # columns: 5 2017-10-01 16:55:57 INFO # clusters/row: 2 2017-10-01 16:55:57 INFO # clusters/column: 29 2017-10-01 16:55:57 INFO # CLUSTERS: 43 2017-10-01 16:55:57 INFO use operons: 1 2017-10-01 16:55:57 INFO using MEME version 4.10.2 2017-10-01 16:55:57 INFO attempting automatic download of operons from Microbes Online 2017-10-01 16:55:57 INFO NCBI CODE IS: 64091 2017-10-01 16:55:57 INFO Automatically using STRING file in 'cache/64091.gz' (URL: http://networks.systemsbiology.net/string9/64091.gz) 2017-10-01 16:55:58 WARNING can't find the correct RSAT mapping ! 2017-10-01 16:55:58 INFO KEGG = 'Halobacterium NRC 1 uid57769' -> RSAT = 'Dictyostelium_discoideum' 2017-10-01 16:55:58 INFO Creating networks... 2017-10-01 16:55:58 INFO stringdb.read_edges2() 2017-10-01 16:55:59 INFO Finished loading cache/64091.gz 2017-10-01 16:56:02 WARNING 2589 (out of 471056) nodes not found in synonyms 2017-10-01 16:56:02 INFO stringdb.read_edges2(), 0 edges read, 235528 edges ignored Traceback (most recent call last): File "/Users/kjsdhjasv/anaconda/envs/bioinfo/bin/cmonkey2", line 36, in <module> cmonkey_run.run() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 439, in run self.prepare_run() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 408, in prepare_run thesaurus = self.organism().thesaurus() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 163, in organism self.__organism = self.make_organism() File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/cmonkey_run.py", line 273, in make_organism self['fasta_file']) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/organism.py", line 244, in __init__ fasta_file) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/organism.py", line 117, in __init__ OrganismBase.__init__(self, code, network_factories, ratios=ratios) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/organism.py", line 72, in __init__ self.__networks.append(make_network(self, ratios)) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/stringdb.py", line 135, in make_network organism, ratios) File "/Users/kjsdhjasv/anaconda/envs/bioinfo/lib/python3.6/site-packages/cmonkey/network.py", line 150, in create raise Exception("Error: only %d edges in network '%s'" % (len(network_edges), name)) Exception: Error: only 0 edges in network 'STRING'

weiju commented 7 years ago

Hi, I think this is because there is already data in your cache directory. Could you please delete the cache directory and restart the run with the --rsat_base_url switch ? Thanks !

wwick commented 7 years ago

I deleted the cache, and the run was successful. Thanks so much!

weiju commented 7 years ago

Great to hear it works for you now. Thank you very much for reporting this, so we were able to add a workaround.