baliga-lab / cmonkey2

Python port of cMonkey, a machine-learning based method for clustering
GNU Lesser General Public License v3.0
26 stars 16 forks source link

Invalid literal error running human data #68

Open maxneal opened 7 years ago

maxneal commented 7 years ago

I'm attempting to run cmonkey2 on some human microarray data and I'm getting an "invalid lieral" error. I'm using a local protein_coding_names.tab file that I downloaded from RSAT which is in a directory called "rsatdir" and I've turned off string scoring for now. Any idea what's causing the error? Apologies if I'm making some kind of rookie mistake. Many thanks.

Command and output are below.

./bin/cmonkey2.sh --organism hsa --nooperons --rsat_organism Homo_sapiens_GRCh37 --rsat_base_url http://rsat.sb-roscoff.fr --rsat_features protein_coding --rsat_dir ../cmonkey2/rsatdir/ --nostring --out outzak --use_BSCM --use_chi2 ~/HIVRAD/ratiosZak2012.txt

Running cmonkey with 'python' 2017-05-11 00:10:48 INFO checking MEME... 2017-05-11 00:10:49 INFO Input matrix has # rows: 20667, # columns: 14 2017-05-11 00:10:49 INFO # clusters/row: 2 2017-05-11 00:10:49 INFO # clusters/column: 1378 2017-05-11 00:10:49 INFO # CLUSTERS: 2067 2017-05-11 00:10:49 INFO use operons: 0 2017-05-11 00:10:49 INFO using MEME version 4.9.1 2017-05-11 00:10:50 INFO using RSAT files for 'Homo_sapiens_GRCh37' 2017-05-11 00:10:50 INFO attempting automatic download of operons from Microbes Online 2017-05-11 00:10:50 INFO KEGG = 'Homo sapiens (human)' -> RSAT = 'Homo_sapiens_GRCh37' 2017-05-11 00:10:50 INFO Creating networks... 2017-05-11 00:10:50 INFO Finished creating networks. Traceback (most recent call last): File "/home/mneal/RNAseqTools/cmonkey2/bin/cmonkey2", line 36, in cmonkey_run.run() File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/cmonkey_run.py", line 512, in run self.prepare_run() File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/cmonkey_run.py", line 479, in prepare_run row_scoring, col_scoring = self.setup_pipeline() File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/cmonkey_run.py", line 432, in setup_pipeline for fun in self['pipeline']['row-scoring']['args']['functions']] File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/motif.py", line 475, in init ratios, 'upstream', config_params) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/motif.py", line 167, in init self.__setup_meme_suite(config_params) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/motif.py", line 134, in __setup_meme_suite bgorder=int(self.config_params['MEME']['background_order'])) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/meme.py", line 764, in global_background_file seqtype=seqtype) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/organism.py", line 217, in sequences_for_genes_scan return self.sequence_source.seqs_for(genes, self.scan_distances[seqtype]) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/organism.py", line 337, in seqs_for unique_seqs = unique_sequences(shifted_pairs) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/organism.py", line 323, in unique_sequences features = self.organism.read_features(unique_feature_ids) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/organism.py", line 184, in read_features features[feature_id] = read_feature(line) File "/home/mneal/RNAseqTools/cmonkey2/cmonkey/organism.py", line 175, in read_feature int(line[4].lstrip('<>')), ValueError: invalid literal for int() with base 10: 'chromosome:GRCh37:15:1:102531392:1'