Gaius-Augustus / GALBA

GALBA is a pipeline for fully automated prediction of protein coding gene structures with AUGUSTUS in novel eukaryotic genomes for the scenario where high quality proteins from one or several closely related species are available.
Other
132 stars 4 forks source link

Using GALBA for prediction of a specific gene family #58

Open panosioannidis opened 1 day ago

panosioannidis commented 1 day ago

Can I use GALBA for predicting the genes of a gene family?

Suppose, for example, that I'm interested in finding all cytochrome P450s in a particular species. It would be very nice to be able to give a set of reference P450s (only from the species of interest, or from other species as well) to GALBA and get back a list of P450s in this particular species. The idea is that even though you get most of the P450s using a genome-wide prediction tool such as BRAKER3, there are still some missing (for various reasons: eg not enough RNAseq evidence). Usually, a lot of manual curation is needed in order to obtain all of the genes in a gene family, but this is not scalable (obviously!).

So have you used GALBA for doing such a thing? Or something similar? Or do you know of someone else who has done something similar? And maybe using another approach (i.e. without using GALBA)?

Thanks in advance, Panos

KatharinaHoff commented 1 day ago

You are looking for something like AUGUSTUS-PPX, but miniprot will be easier to use.

panosioannidis @.***> schrieb am Do. 28. Nov. 2024 um 12:07:

Can I use GALBA for predicting the genes of a gene family?

Suppose, for example, that I'm interested in finding all cytochrome P450s in a particular species. It would be very nice to be able to give a set of reference P450s (only from the species of interest, or from other species as well) to GALBA and get back a list of P450s in this particular species. The idea is that even though you get most of the P450s using a genome-wide prediction tool such as BRAKER3, there are still some missing (for various reasons: eg not enough RNAseq evidence). Usually, a lot of manual curation is needed in order to obtain all of the genes in a gene family, but this is not scalable (obviously!).

So have you used GALBA for doing such a thing? Or something similar? Or do you know of someone else who has done something similar? And maybe using another approach (i.e. without using GALBA)?

Thanks in advance, Panos

— Reply to this email directly, view it on GitHub https://github.com/Gaius-Augustus/GALBA/issues/58, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJMC6JBQCYDNNNZFAY3VXLD2C32QTAVCNFSM6AAAAABSU36ENOVHI2DSMVQWIX3LMV43ASLTON2WKOZSG4YDCNZYG43DKNA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

panosioannidis commented 22 hours ago

Thanks for the response!

Ok, I'll have a look at miniprot... Does it produce a gene model, or just an alignment which I'll then have to convert to a proper gene model (myself)?

I've tried augustus-ppx in the past, but I remember that it needs quite a bit of tweaking and even then it misses some genes. I was hoping for something more automated...

EDIT: just saw that miniprot also produces a gene model as well (with the gff/gtf switch. At least that's what it looks like...