Closed LuziaThea closed 6 years ago
Hi Luzia, thanks for your report. From the information in your description my first guess would be that somehow the feature id in the RSAT features file does not match what was chosen as the left side in the synonyms file. BTW, I can't remember if I have used the rsat_dir and rsat_features in combination, so if you already have an rsat_dir option I would recommend to use that and follow the directory structure as described in
http://baliga-lab.github.io/cmonkey2/input_format.html
That's a guess though, if that doe not work we might have to take a look at your RSAT files. Hopefully that gets us a bit further. Please let me know how it goes for you !
Hello,
I am running cmonkey2 on human data. I downloaded the RSAT files for Homo_sapiens_GRCh38 and use them with the –rsat_dir, --rsat_features and –rsat_organism option. I am using a ratio file with protein expression data with uniprot ids. My String file has also uniprot ids. I downloaded the protein_coding.tab, the protein_coding_names.tab and I added the ensembl transcript to uniprot id translations to the protein_coding_names.tab.
In my RSAT directory are the following files: organism.tab feature_names.tab (based on protein_coding_names.tab) feature.tab (based on protein_coding.tab) all contig files
I get the following error:
python /nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/bin/cmonkey2 \ /nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/staldluz/CMonkey2_data/Human_new/NCI60_Ratio_log2_uniprot.txt \ --organism hsa \ --string /nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/staldluz/CMonkey2_data/Human_new/String_human_uniprot_complete.txt \ --rsat_organism Homo_sapiens_GRCh38 \ --rsat_dir /nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/staldluz/CMonkey2_data/Human_new/RSAT \ --rsat_features feature \ --nooperons \ --out ./Output_NCI60_17
2018-03-24 22:18:25 INFO checking MEME... 2018-03-24 22:18:26 INFO Input matrix has # rows: 3171, # columns: 59 2018-03-24 22:18:26 INFO # clusters/row: 2 2018-03-24 22:18:26 INFO # clusters/column: 211 2018-03-24 22:18:26 INFO # CLUSTERS: 317 2018-03-24 22:18:26 INFO use operons: 0 2018-03-24 22:18:26 INFO using MEME version 4.10.2 2018-03-24 22:18:28 INFO using RSAT files for 'Homo_sapiens_GRCh38' 2018-03-24 22:18:28 INFO attempting automatic download of operons from Microbes Online 2018-03-24 22:18:28 INFO Loading STRING file at '/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/staldluz/CMonkey2_data/Human_new/String_human_uniprot_complete.txt' 2018-03-24 22:18:28 INFO KEGG = 'Homo sapiens (human)' -> RSAT = 'Homo_sapiens_GRCh38' 2018-03-24 22:18:28 INFO Creating networks... 2018-03-24 22:18:28 INFO stringdb.read_edges2() 2018-03-24 22:18:51 INFO Finished loading /nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/staldluz/CMonkey2_data/Human_new/String_human_uniprot_complete.txt 2018-03-24 22:19:10 INFO Processing network 5% 2018-03-24 22:19:11 INFO Processing network 10% 2018-03-24 22:19:13 INFO Processing network 15% 2018-03-24 22:19:14 INFO Processing network 20% 2018-03-24 22:19:15 INFO Processing network 25% 2018-03-24 22:19:16 INFO Processing network 30% 2018-03-24 22:19:17 INFO Processing network 35% 2018-03-24 22:19:18 INFO Processing network 40% 2018-03-24 22:19:19 INFO Processing network 45% 2018-03-24 22:19:20 INFO Processing network 50% 2018-03-24 22:19:21 INFO Processing network 55% 2018-03-24 22:19:22 INFO Processing network 60% 2018-03-24 22:19:23 INFO Processing network 65% 2018-03-24 22:19:24 INFO Processing network 70% 2018-03-24 22:19:25 INFO Processing network 75% 2018-03-24 22:19:26 INFO Processing network 80% 2018-03-24 22:19:27 INFO Processing network 85% 2018-03-24 22:19:28 INFO Processing network 90% 2018-03-24 22:19:29 INFO Processing network 95% 2018-03-24 22:19:30 INFO Processing network 100% 2018-03-24 22:19:30 WARNING 14444 (out of 18995736) nodes not found in canonical gene names 2018-03-24 22:19:30 INFO stringdb.read_edges2(), 782284 edges read, 8715584 edges ignored 2018-03-24 22:19:33 INFO Finished creating networks. 2018-03-24 22:19:43 ERROR No sequences read for hsa! Traceback (most recent call last): File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/bin/cmonkey2", line 36, in
cmonkey_run.run()
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/cmonkey_run.py", line 439, in run
self.prepare_run()
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/cmonkey_run.py", line 413, in prepare_run
row_scoring, col_scoring = self.setup_pipeline()
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/cmonkey_run.py", line 366, in setup_pipeline
for fun in self['pipeline']['row-scoring']['args']['functions']]
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/motif.py", line 474, in init
ratios, 'upstream', config_params)
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/motif.py", line 166, in init
self.__setup_meme_suite(config_params)
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/motif.py", line 134, in __setup_meme_suite
bgorder=int(self.config_params['MEME']['background_order']))
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/meme.py", line 802, in global_background_file
seqtype=seqtype)
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/organism.py", line 217, in sequences_for_genes_scan
return self.sequence_source.seqs_for(genes, self.scan_distances[seqtype])
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/organism.py", line 338, in seqs_for
return {gene: unique_seqs[head] for gene, head in shifted_pairs}
File "/nfs/nas21.ethz.ch/nas/fs2102/biol_ibt_usr_s1/bamir/Computation_on_Clusters/Virtual_env_Miniconda_euler/miniconda2/envs/cmonkey/lib/python2.7/site-packages/cmonkey/organism.py", line 338, in
return {gene: unique_seqs[head] for gene, head in shifted_pairs}
KeyError: 'ENST00000295971'
The ENST00000295971 transcript that gives the error is the first protein of the ratio list and it appears in feature.tab and feature_names.tab. The fact that it gives the correct transcript id in the error indicates that the id translation itself works (also if I give the ratio and string tables directly as transcript ids that don’t need to be translated I get the same error). The sequence-contig files do have the right names and are also in the required lowercase format. They often start with long stretches of n’s but that doesn’t seem to be a problem (the error remains the same if I replace the n’s with a’s).
Do you know where the error could come from?
Thank you so much for your help!
Best regards, Luzia