gene-to-transcript mapping file from non-trinity assembly

agroppi commented 5 years ago

Hi,

I'm trying to use trinotate using data from gene prediction produced by Braker2.0 (augustus). Thank to your help I managed to go through the first step of Transdecoder (https://github.com/TransDecoder/TransDecoder/issues/72 ) I have run all the step described here : https://github.com/Trinotate/Trinotate.github.io/wiki/Software-installation-and-data-required#3-running-sequence-analyses But I have some doubt about the first step of "Loading generated results into a Trinotate SQLite Database" My concern is about producing a gene-to-transcript mapping file. From which file can it be constructed ?

Here are below all the data I have now in my Trinotate working directory :


augustus.hints_exon.gtf
augustus.hints.gtf
blastp.outfmt6
blastx.outfmt6
/Marouch_1.5_database
Marouch_Genome_V1.5.fasta
Marouch_Genome_V1.5.fasta.fai
Marouch_transcripts.fasta
Marouch_transcripts.fasta.rnammer.gff
Marouch_transcripts.fasta.transdecoder.bed
Marouch_transcripts.fasta.transdecoder.cds
/Marouch_transcripts.fasta.transdecoder_dir
/Marouch_transcripts.fasta.transdecoder_dir.__checkpoints
/Marouch_transcripts.fasta.transdecoder_dir.__checkpoints_longorfs
Marouch_transcripts.fasta.transdecoder.genome.gff3
Marouch_transcripts.fasta.transdecoder.gff3
Marouch_transcripts.fasta.transdecoder.pep
Marouch_transcripts.gff3
pfam.log
signalp.out
Signalp_temp
tmhmm.out
tmp.superscaff.rnammer.gff
transcriptSuperScaffold.bed
transcriptSuperScaffold.fasta
/Trinotate_databases

Thanks

agroppi commented 5 years ago

When I look into my Marouch_transcripts.fasta file I have :

>jg25328.t1 jg25328
>jg25224.t1 jg25224
>jg25204.t1 jg25204
>jg25354.t1 jg25354
>jg25354.t2 jg25354
>jg25351.t1 jg25351
...
>jg64.t1 jg64
>jg53.t3 jg53
>jg53.t1 jg53
>jg53.t2 jg53
...

Would "t1, "t2", "t3"... be the equivalent of "seq1", "seq2", "seq3" in a Trinity.fasta ?

Therefore, in this case, the gene-to-transcript mapping file would it be something like this ? :

jg25328 jg25328.t1
jg25224 jg25224.t1
jg25204 jg25204.t1
jg25354 jg25354.t1
jg25354 jg25354.t2
jg25351 jg25351.t1
...
jg64    jg64.t1
jg53    jg53.t3
jg53    jg53.t1
jg53    jg53.t2
...

Thanks

brianjohnhaas commented 5 years ago

Right, make the gene-to-transcript file like your suggested version, but just be sure to include tabs as separators.

best,

~brian

On Tue, Sep 18, 2018 at 5:56 AM Alexis Groppi notifications@github.com wrote:

When I look into my Marouch_transcripts.fasta file I have :

jg25328.t1 jg25328 jg25224.t1 jg25224 jg25204.t1 jg25204 jg25354.t1 jg25354 jg25354.t2 jg25354 jg25351.t1 jg25351 ... jg64.t1 jg64 jg53.t3 jg53 jg53.t1 jg53 jg53.t2 jg53 ...

Would "t1, "t2", "t3"... be the equivalent of "seq1", "seq2", "seq3" in a Trinity.fasta ?

Therefore, in this case, the gene-to-transcript mapping file would it be something like this ? :

jg25328 jg25328_t1 jg25224 jg25224_t1 jg25204 jg25204_t1 jg25354 jg25354_t1 jg25354 jg25354_t2 jg25351 jg25351_t1 ... jg64 jg64_t1 jg53 jg53_t3 jg53 jg53_t1 jg53 jg53_t2 ...

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Trinotate/Trinotate.github.io/issues/7#issuecomment-422331508, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMVX_yGoz1WZ3sYqZvDd5dnB4b30eNfks5ucMM7gaJpZM4WsH0K .

--

Brian J. Haas The Broad Institute http://broadinstitute.org/~bhaas http://broad.mit.edu/~bhaas

Trinotate / Trinotate.github.io

gene-to-transcript mapping file from non-trinity assembly #7

--