Closed crystal0721 closed 6 years ago
Hi @crystal0721 ,
reconstruct_contig.py
currently does not support the -h
option, despite what it says when you directly type the command. If you want to know what options are avail just type the program name itself. Sorry about that. I'll fix it in the next release.
(anaCogent)etseng@login14-biofx02:~$ reconstruct_contig.py
usage: reconstruct_contig.py [-h] [-e EXPECTED_ERROR_RATE]
[--nx_cycle_detection] [-k KMER_SIZE]
[-p OUTPUT_PREFIX] [-D GMAP_DB_PATH]
[-d GMAP_SPECIES] [--small_genome] [--version]
[--debug]
dirname
reconstruct_contig.py: error: too few arguments
-Liz
hi Magdoll, Thank you very much for your replay and I had soloved. While having something that I want to get help or suggestion from you. There is a way that can combine transcripts produced by Trinity and Full-length transcripts produced by SMRTlink or SMRTanalysis besides CDHIT,the results we want to obtain is GeneID and its corresponding Transcripts ID in order to the following analysis.
Crystal
Hi @crystal0721 ,
One possibility is to combine {Trinity} + {HQ isoforms from SMRTLink/SMRTAnalysis} into a single fasta file and run it through Cogent. After reconstruction, use the collapse script per this tutorial to collapse any redundant transcripts. The collapse script should not care if the sequence comes from Trinity or HQ. Let me know if you run into issues.
--Liz
hi Magdoll, Thank you for replaying. I would know what you mean,mybe I will try it.While there are some questions I want to discuss.When we use cogent to collapse redundant transcripts to obtain cogent2.renamed.fasta which was considered as genome,why we use Cupcake to redundant transcript again?I thought the cogent2.renamed.fasta was non-redundant sequence which could be as genome sequence of Pacbio data. Now I had finish Cogent and obtained cogent2.renamed.fasta for Pacbio data. For illumina data I could choose the long transcript as unigene. There is no way could combine cogent2.renamed.fasta and unigene. Or this way is not good than above(combine transcript from Trinity and SMRT using cogent and redundanting by Cupcake.) I hope the questions was described clearly and doest not bother you. Thank you again.
Crystal
Hi @crystal0721 ,
cogent2.renamed.fasta
is the reconstructed "fake" genome that consists of just the transcribed coding portions of a genome. So yes, you are correct that "cogent2.renamed.fasta was non-redundant sequence which could be as genome sequence of Pacbio data".
I was answering your question in how to combine (Illumina) Trinity data with PacBio HQ isoforms. The reason I suggested combining Trinity + HQ isoforms and run Cogent is to get the union of the genes the two platforms cover.
Another possibility, which you have hinted, is:
cogent2.renamed.fasta
and collapse. All HQ will align. Some Trinity data may not. The ones from Trinity that did not map will be genes seen in Illumina only.The combination of {2} + {3} will then get you the combined final dataset.
--Liz
Hi @Magdoll,
I also want to combine the HQ and rnaspades assembly to generate a complete transcriptome. I used the above strategy:
alt. junction found at 1910 alt. junction found at 1957 alt. junction found at 2413 alt. junction found at 3265 alt. junction found at 3277 alt. junction found at 3279 alt. junction found at 230 alt. junction found at 1837 alt. junction found at 3118 alt. junction found at 3154 alt. junction found at 24 alt. junction found at 1893 Traceback (most recent call last): File "/home/jiaochen/miniconda3/bin/collapse_isoforms_by_sam.py", line 4, in
import('pkg_resources').run_script('cupcake==5.3', 'collapse_isoforms_by_sam.py') File "/home/jiaochen/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 741, in run_script self.require(requires)[0].run_script(script_name, ns) File "/home/jiaochen/miniconda3/lib/python2.7/site-packages/pkg_resources/init.py", line 1502, in run_script exec(code, namespace, namespace) File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 245, in main(args) File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 204, in main collapse_fuzzy_junctions(f_good.name, f_txt.name, args.allow_extra_5exon, internal_fuzzy_max_dist=args.max_fuzzy_junction) File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 149, in collapse_fuzzy_junctions _size = get_fl_from_id(group_info[pbid]) File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in get_fl_from_id return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members) File "/data/jiaochen/miniconda3/lib/python2.7/site-packages/cupcake-5.3-py2.7-linux-x86_64.egg/EGG-INFO/scripts/collapse_isoforms_by_sam.py", line 92, in return sum(int(_id.split('/')[1].split('p')[0][1:]) for _id in members) IndexError: list index out of range Do you know what's wrong?
If I only use the HQ data, I can get the final results without any error. But 10% of the HQ sequences were ignored because of the coverage issues, is that normal?
Thanks!
Chen
Hi @jayceejiao ,
The collapse scripts expect a certain sequence ID format that the RNA-seq data is not following. It needs to look like this:
>HQ_sample1|cb123_10/f10p0/1234
it's looking for the "f10p0" which needs to be after the right "/" symbol.
YOu can use a Python script to force RNA-seq/Trinity data to have that:
from Bio import SeqIO
f = open('Trinity.modified.fasta', 'w')
i = 1
for r in SeqIO.parse(open('Trinity.fasta'),'fasta'):
f.write(">HQ_Trinity|cb1_{0}/f2p0/{1} {2}\n".format(i, len(r.seq), r.id))
f.write("{0}\n".format(r.seq))
i += 1
f.close()
--Liz
Hi @Magdoll,
I have changed the transcript id and got the final representative transcriptome. Thanks!
hi, Dear when I test reconstruct_contig.py,there is a problem as follows: reconstruct_contig.py -h Traceback (most recent call last): File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/bin/reconstruct_contig.py", line 4, in
import('pkg_resources').run_script('Cogent==2.1', 'reconstruct_contig.py')
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/pkg_resources/init.py", line 748, in run_script
self.require(requires)[0].run_script(script_name, ns)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/pkg_resources/init.py", line 1517, in run_script
exec(code, namespace, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/site-packages/Cogent-2.1-py2.7.egg/EGG-INFO/scripts/reconstruct_contig.py", line 301, in
args = parser.parse_args()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1701, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1733, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1939, in _parse_known_args
start_index = consume_optional(start_index)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1879, in consume_optional
take_action(action, args, option_string)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 1807, in take_action
action(self, namespace, argument_values, option_string)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 996, in call
parser.print_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 2340, in print_help
self._print_message(self.format_help(), file)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 2314, in format_help
return formatter.format_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 281, in format_help
help = self._root_section.format_help()
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 211, in format_help
func(args)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 211, in format_help
func(args)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 517, in _format_action
help_text = self._expand_help(action)
File "/home/software/Cogent/software/Anaconda2/envs/anaCogent/lib/python2.7/argparse.py", line 603, in _expand_help
return self._get_help_string(action) % params
ValueError: unsupported format character ')' (0x29) at index 32
The python version is 2.7.14,could you tell be how to solove it,thank you very much.
Crystal