Gaius-Augustus / BRAKER

BRAKER is a pipeline for fully automated prediction of protein coding gene structures with GeneMark-ES/ET/EP/ETP and AUGUSTUS in novel eukaryotic genomes
Other
334 stars 80 forks source link

How to determine genes by scaffold ? #794

Open emmafg opened 3 months ago

emmafg commented 3 months ago

Hello, I am new to gene prediction. As part of my research I have to predict the number of genes on some of my scaffolds. For one of my scaffolds (scaffold 10) its size is sufficient for Braker3 to run on it and the output is very good, from braker.codingseq I obtain my predicted number of genes using "grep-c"'>" braker.codingseq". However, for another much smaller scaffold (scaffold_12) I had to couple it with another (scaffold 10) so that its size was sufficient to brake. Except that in my outputs I don't know how to determine the number of genes predicted only on scaffold 12. Is there a way to recover the predicted genes based on the scaffolds?

Thank you for your help ! Emma

KatharinaHoff commented 3 months ago

I recommend running BRAKER on the complete genome. You can extract the scaffold-specific predictions, afterwards. For example, to get the predictions of scaffold_12:

grep -P '^scaffold_12\t' braker.gtf | grep -P '\ttranscript\t' | cut -f 9 > scaff12_tx.lst # get all the transcript names on the scaffold

cdbfasta braker.codingseq -o braker.codingseq.idx # index the codingseq file

cat scaff12_tx.lst | cdbyank braker.codingseq.idx -d braker.codingseq > scaff12_tx.codingseq # extract the transcripts from codingseq file

Typos are possible in these commands, I am drafting them without testing.

On Tue, Apr 2, 2024 at 12:04 PM emmafg @.***> wrote:

Hello, I am new to gene prediction. As part of my research I have to predict the number of genes on some of my scaffolds. For one of my scaffolds (scaffold 10) its size is sufficient for Braker3 to run on it and the output is very good, from braker.codingseq I obtain my predicted number of genes using "grep-c"'>" braker.codingseq". However, for another much smaller scaffold (scaffold_12) I had to couple it with another (scaffold 10) so that its size was sufficient to brake. Except that in my outputs I don't know how to determine the number of genes predicted only on scaffold 12. Is there a way to recover the predicted genes based on the scaffolds?

Thank you for your help ! Emma

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/BRAKER/issues/794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JFKJ3M3NHP4GDETST3Y3J7ELAVCNFSM6AAAAABFTCGKSKVHI2DSMVQWIX3LMV43ASLTON2WKOZSGIZDAMBYHA2TOMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>