TransDecoder / TransDecoder

TransDecoder source
Other
267 stars 58 forks source link

PFAM annotation #3

Closed guillermomarco closed 8 years ago

guillermomarco commented 9 years ago

Hi I'm trying to use PFAM --retain_pfam_hits option with TransDecoder.Predict I'm getting no error or results at all regarding PFAM.

I've downloaded both Pfam-A.hmm.gz and Pfam-B.hmm.gz and joined them using the following commands (found them in an old SourceForge post):

gunzip -c Pf*gz > Pfam-AB.hmm
hmmconvert -b Pfam-AB.hmm > Pfam-AB.hmm.bin

I've tried passing as input both Pfam-AB.hmm or Pfam-AB.hmm.bin and none of them working. I'm getting no errors nor PFAM in results.

TransDecoder.Predict -t CC_filtered.fasta --retain_pfam_hits /share/gluster/tests/gmarco/transdecoder/Pfam-AB.hmm
PFAM output found and processing...
CMD: /share/apps/src/TransDecoder-2.0.1/util/index_gff3_files_by_isoform.pl CC_filtered.fasta.transdecoder_dir/longest_orfs.gff3
-indexing [TCONS_00144283|g.1577]  
 Indexed TCONS_00144283|m.1577 
CMD: /share/apps/src/TransDecoder-2.0.1/util/gene_list_to_gff.pl CC_filtered.fasta.transdecoder_dir/longest_orfs.cds.scores.selected CC_filtered.fasta.transdecoder_dir/longest_orfs.gff3.inx > CC_filtered.fasta.transdecoder_dir/longest_orfs.cds.best_candidates.gff3
CMD: /share/apps/src/TransDecoder-2.0.1/util/remove_eclipsed_ORFs.pl CC_filtered.fasta.transdecoder_dir/longest_orfs.cds.best_candidates.gff3 > CC_filtered.fasta.transdecoder.gff3
-indexing [TCONS_00144283|g.1576]  
CMD: /share/apps/src/TransDecoder-2.0.1/util/gff3_file_to_bed.pl CC_filtered.fasta.transdecoder.gff3 > CC_filtered.fasta.transdecoder.bed
-indexing [TCONS_00144283|g.1576]  
CMD: /share/apps/src/TransDecoder-2.0.1/util/gff3_file_to_proteins.pl CC_filtered.fasta.transdecoder.gff3 CC_filtered.fasta > CC_filtered.fasta.transdecoder.pep
-indexing [TCONS_00144283|g.1576]  
CMD: /share/apps/src/TransDecoder-2.0.1/util/gff3_file_to_proteins.pl CC_filtered.fasta.transdecoder.gff3 CC_filtered.fasta CDS > CC_filtered.fasta.transdecoder.cds
-indexing [TCONS_00144283|g.1576]  
CMD: /share/apps/src/TransDecoder-2.0.1/util/gff3_file_to_proteins.pl CC_filtered.fasta.transdecoder.gff3 CC_filtered.fasta cDNA > CC_filtered.fasta.transdecoder.mRNA
-indexing [TCONS_00144283|g.1576]  
transdecoder is finished.  See output files CC_filtered.fasta.transdecoder.*
brianjohnhaas commented 8 years ago

Hi,

You actually have to run pfam and provide the domain table output file as the parameter value to --retain_pfam_hits. The documentation from transdecoder.github.io is:

Search the peptides for protein domains using Pfam. This requires hmmer3 and Pfam databases to be installed.

hmmscan --cpu 8 --domtblout pfam.domtblout /path/to/Pfam-A.hmm transdecoder_dir/longest_orfs.pep Just as with the blast search, if you have access to a computing grid, consider using HPC GridRunner.

Integrating the Blast and Pfam search results into coding region selection The outputs generated above can be leveraged by TransDecoder to ensure that those peptides with blast hits or domain hits are retained in the set of reported likely coding regions. Run TransDecoder.Predict like so:

TransDecoder.Predict -t target_transcripts.fasta --retain_pfam_hits pfam.domtblout --retain_blastp_hits blastp.outfmt6 The final coding region predictions will now include both those regions that have sequence characteristics consistent with coding regions in addition to those that have demonstrated blast homology or pfam domain content.