eead-csic-compbio / get_homologues

GET_HOMOLOGUES: a versatile software package for pan-genome analysis
Other
110 stars 26 forks source link

Reporting lineage specific genes? #91

Closed anandksrao closed 2 years ago

anandksrao commented 2 years ago

Greetings!

Can your GET_HOMOLOGUES-EST (plants) be used directly for OR simply adapted for the additional purpose of reporting lineage specific genes, not just pan genome analyses?

And would the ability to perform such analyses extend regardless of taxonomic level, i.e

If yes, would there however be any non-obvious caveats to performing such analyses and/or interpreting their results?

Thank you in advance.

eead-csic-compbio commented 2 years ago

Hi @anandksrao

apart from the manual (http://eead-csic-compbio.github.io/get_homologues/manual-est) and the tutorial (http://eead-csic-compbio.github.io/get_homologues/tutorial/pangenome_tutorial.html), we worked last year on an updated step by step protocol for plants. It is about to be published, but you can already use it at

http://eead-csic-compbio.github.io/get_homologues/plant_pangenome/protocol.html

Please use to that as a guide, your question relates to point 3.3.5. Let me know if anything is not clear.

We have used it at the species level (barley and Arabidopsis thaliana here) and at the genus level with outgroups from other genus (Brachypodium here and here). However, as BLASTN is used to compute homologous sequences based on nucleotide alignments, it should not be used for long taxonomic distances, as nucleotide distances saturate and BLASTN megablast hardly goes below 70% sequence identity. In that case protein sequences are more adequate (as done by standard GET_HOMOLOGUES), but even in that case you probably want phylogeny-based orthology calls to carry out analyses at the Kingdom level.

Hope this helps, Bruno