Closed vectorborne5 closed 6 years ago
Hi,
this section of the Wiki
https://github.com/baliga-lab/cmonkey2/wiki/Input-file-formats
("RSAT mockup directories") describes the necessary files for RSAT, we have been using mouse and human but usually use customized data that is provided in a RSAT-like structure and specifying with the data directory using the --rsat_dir directory.
The list index out of range message usually hints at missing gene names, but it might be easier to tell with the complete error output provided.
I've seemingly fixed the "list index out of range" problem (there was apparently a stray tab somewhere in my string file). However, I now encounter a new error after the program processes a bit further in the run:
Nick3$./cmonkey.py --organism mmu --ratios /home/nick/Desktop/mmu_test.tsv --string /home/nick/Desktop/string/mmu.gz --rsat_dir /home/nick/Desktop/rsat/mmu/ --rsat_organism Mus_musculus_EnsEMBL --rsat_features protein_coding --nooperons 2015-03-10 16:24:00 INFO checking MEME... 2015-03-10 16:24:01 INFO Input matrix has # rows: 16634, # columns: 4 2015-03-10 16:24:01 INFO # clusters/row: 2 2015-03-10 16:24:01 INFO # clusters/column: 1109 2015-03-10 16:24:01 INFO # CLUSTERS: 1663 2015-03-10 16:24:01 INFO use operons: 0 2015-03-10 16:24:01 INFO using MEME version 4.10.0 2015-03-10 16:24:02 INFO using RSAT files for 'Mus_musculus_EnsEMBL' 2015-03-10 16:24:02 INFO attempting automatic download of operons from Microbes Online 2015-03-10 16:24:02 INFO Loading STRING file at '/home/nick/Desktop/string/mmu.gz' 2015-03-10 16:24:02 INFO KEGG = 'Mus musculus (house mouse)' -> RSAT = 'Mus_musculus_EnsEMBL' 2015-03-10 16:24:02 INFO Creating networks... 2015-03-10 16:24:02 INFO stringdb.read_edges2() 2015-03-10 16:24:27 INFO Finished loading /home/nick/Desktop/string/mmu.gz 2015-03-10 16:24:28 INFO Processing network 5% 2015-03-10 16:24:29 INFO Processing network 10% 2015-03-10 16:24:29 INFO Processing network 15% 2015-03-10 16:24:30 INFO Processing network 20% 2015-03-10 16:24:31 INFO Processing network 25% 2015-03-10 16:24:31 INFO Processing network 30% 2015-03-10 16:24:32 INFO Processing network 35% 2015-03-10 16:24:32 INFO Processing network 40% 2015-03-10 16:24:33 INFO Processing network 45% 2015-03-10 16:24:33 INFO Processing network 50% 2015-03-10 16:24:34 INFO Processing network 55% 2015-03-10 16:24:34 INFO Processing network 60% 2015-03-10 16:24:35 INFO Processing network 65% 2015-03-10 16:24:35 INFO Processing network 70% 2015-03-10 16:24:36 INFO Processing network 75% 2015-03-10 16:24:36 INFO Processing network 80% 2015-03-10 16:24:37 INFO Processing network 85% 2015-03-10 16:24:37 INFO Processing network 90% 2015-03-10 16:24:38 INFO Processing network 95% 2015-03-10 16:24:38 INFO Processing network 100% 2015-03-10 16:24:38 INFO stringdb.read_edges2(), 94 edges read, 4850754 edges ignored 2015-03-10 16:24:39 INFO Finished creating networks. Traceback (most recent call last): File "./cmonkey.py", line 36, incmonkey_run.run() File "/home/nick/Desktop/cmonkey2/cmonkey/cmonkey_run.py", line 505, in run self.prepare_run() File "/home/nick/Desktop/cmonkey2/cmonkey/cmonkey_run.py", line 472, in prepare_run row_scoring, col_scoring = self.__setup_pipeline() File "/home/nick/Desktop/cmonkey2/cmonkey/cmonkey_run.py", line 425, in __setup_pipeline for fun in self['pipeline']['row-scoring']['args']['functions']] File "/home/nick/Desktop/cmonkey2/cmonkey/cmonkey_run.py", line 206, in membership self.__membership = self.__make_membership() File "/home/nick/Desktop/cmonkey2/cmonkey/cmonkey_run.py", line 200, in __make_membership self.config_params) File "/home/nick/Desktop/cmonkey2/cmonkey/membership.py", line 310, in create_membership config_params, matrix.row_indexes, matrix.column_indexes) File "/home/nick/Desktop/cmonkey2/cmonkey/membership.py", line 78, in __init__ self.row_membs[self.rowidx[row]][i] = tmp[i] IndexError: index 15226 is out of bounds for axis 0 with size 15059
I know this has to do with trying to call a non-existent element, but I'm not sure how precisely to modify any of my input or reference files to counteract this problem.
I've decided to go as rudimentary as possible, using input ratios of about 300 familiar genes, many of which share a common expression profile, and/or have common transcriptional motifs. I felt encouraged as the run finally proceeded into iterations, but received the following errors:
The first:
... 2015-03-17 01:51:55 INFO running meme/mast on cluster 1470, # sequences: 7 2015-03-17 01:51:55 WARNING there is an exception thrown in MAST: Errors from MEME text parser: The pspm of motif 1 has an evalue value 1000 which does not match the existing value of 1000. FATAL: No motifs. 2015-03-17 01:51:55 INFO running meme/mast on cluster 1276, # sequences: 7 ...
The second (which repeats many times over):
... 2015-03-17 04:13:03 ERROR No sequences read for hsa! 2015-03-17 04:13:03 WARNING Cluster 2 with 0 genes: no sequences! ...
My commands to initiate the run are as follows (just FYI):
./cmonkey.py --organism hsa --ratios hsa_test.tsv --string hsa.gz --rsat_organism Homo_sapiens_GRCh37 --rsat_base_url http://rsat.sb-roscoff.fr/ --rsat_features protein_coding --nooperons
I've been struggling with this program for days now. Is there a list of specific guidelines for dealing with human runs beyond that which is listed in the README.md? I seem to be stymied at every step.
Hi, sorry for the late reply. Is there a way you could send me a link to hsa_test.tsv and hsa.gz ? Thanks, Wei-ju
This link has both files present https://app.box.com/s/3phu6800og1re21ehbe7sq2rd2n0f3hc
Thank you very much, I will have a closer look at it
I recently attempted two separate runs using microarray data from human and mouse samples.
I collected all protein network data from STRING v9.1 for use as the string files with either species, formatting the files as Gene1Gene2Normalized Score. Moreover, my ratios files were formatted with matching Ensembl IDs in expression matrix tab-delimited .tsv files.
1) After initiating a run using mouse data (with organism ID = mmu), all initial checks appeared to proceed normally, but I eventually received the following error:
cmonkey.util.DocumentNotFound: //rsat01.biologie.ens.fr/rsat//data/genomes/Mus_musculus_EnsEMBL/genome/feature_names.tab
This file does, indeed, not exist in RSAT, but a file 'gene_names.tab' does exist. Is this a substitute? Can I make any changes to the cmonkey files to account for this?
2) Following the suggestions for a run of human data presented in the readme.md file, and using input file formats similar to those used for the mouse data, again all initial tests seemed to run well until I received this error:
IndexError: list index out of range
I am baffled as to the meaning of this error. What changes must I make to resolve this issue?