gatech-genemark / ProtHint

Protein hint generation pipeline for gene finding in eukaryotic genomes
Other
56 stars 13 forks source link

which one should l use? #32

Closed ld9866 closed 3 years ago

ld9866 commented 3 years ago

Hello! Now I'm assembling a genome and annotating it. The downloaded files are from the ensemble's website, but there are cDNA, CDs, ncRNA and peptide in the homologous species. Which of these four files should I use?

cDNA the cDNA sequences corresponding to Ensembl genes, excluding ncRNA genes, which are in a separate 'ncrna' Fasta file. cDNA consists of transcript sequences for actual and possible genes, including pseudogenes, NMD and the like. See the file names explanation below for different subsets of both known and predicted transcripts. cds These files hold the coding sequences corresponding to Ensembl genes. CDS does not contain UTR or intronic sequence. ncRNA These files hold the transcript sequences corresponding to non-coding RNA genes (ncRNA) Peptide These files hold the protein translations of Ensembl genes.

tomasbruna commented 3 years ago

Hello, for ProtHint, please use the Peptide file.

ld9866 commented 3 years ago

OK! l will try it by myself. Glad to receive your reply. Thank you!