DyogenIBENS / Agora

Algorithm For Gene Order Reconstruction in Ancestors
Other
70 stars 15 forks source link

Clarification on Identifying Successive Genomes and Synthetic Blocks between Ancestral Nodes and Extant Genomes in PhylDiag #25

Closed Caoyu819 closed 1 year ago

Caoyu819 commented 1 year ago

Hi everyone,

I have another question regarding the concept of 'successive genomes' discussed in the paper. Taking the tree topology below as an example, can I consider 'N30' and 'Oreh' as successive genomes?

Screen Shot 2023-08-06 at 13 09 22

To examine rearrangements between 'N30' and 'Oreh', I gather that identifying synthetic blocks between these nodes is necessary. However, I'm uncertain about obtaining the gene family set between the Agora-reconstructed ancestral genome and the extant genome from the provided 'set of gene families' input file in PhylDiag.

For detecting syntenic blocks between 'Oreh' and 'Cvim', directly using 'geneFamily.N30.list' from 'ancGenes.N30.list.bz2' seems valid due to 'N30' being their common ancestor. This command seems applicable: phylDiag.py genes.Oreh.list.bz2 genes.Cvim.list.bz2 geneFamily.N30.list --no-imr -m 50 -t 5 -g 45 >CS_Oreh_Cvim.sbs Concerning synthetic blocks between ancestral 'N30' and extant 'Oreh', I've used the geneFamily file of node 'N24', the closest common ancestor. This command appeared effective: phylDiag.py genes.Oreh.list.bz2 ancGenome.N30.list geneFamily.N24.list --no-imr -m 50 -t 5 -g 45 > CS_N30_Oreh.sbs However, I'm uncertain about the approach's validity and logic. I've attached relevant files. Additionally, does PhylDiag exclusively calculate synthetic blocks between ancestral nodes and extant genomes? Your insights are invaluable.

Thank you very much in advance for your assistance.

Best regards, Yu test_PhylDiag_0806.zip

JosephLucas commented 1 year ago

Hi Yu, Thanks for giving PhylDiag some attention.

can I consider 'N30' and 'Oreh' as successive genomes

Yes

Concerning synthetic blocks between ancestral 'N30' and extant 'Oreh', I've used the geneFamily file of node 'N24', the closest common ancestor. This command appeared effective: phylDiag.py genes.Oreh.list.bz2 ancGenome.N30.list geneFamily.N24.list --no-imr -m 50 -t 5 -g 45 > CS_N30_Oreh.sbs However, I'm uncertain about the approach's validity and logic.

In this case, N30 is considered being itself the last common ancestor of N30 and Oreh. Thus the command becomes

phylDiag.py genes.Oreh.list.bz2 ancGenome.N30.list geneFamily.N30.list --no-imr -m 50 -t 5 -g 45 > CS_N30_Oreh.sbs

Since you are interested in synteny blocks instead of conserved segments I advise adding --no-imcs to disable the identification of mono-genic conserved segments. This identification step is active by default because Phyldiag targets conserved segments by default.

does PhylDiag exclusively calculate synthetic blocks between ancestral nodes and extant genomes?

Just to be sure, by "synthetic" do you mean syntenic ? Phyldiag can estimate synteny blocks between two genomes as soon as it makes any sense to define gene families between them. For example (i) synteny blocks between two genomes derived from a common ancestor (gene families rooted in the ancestor genome are handy in this situation) or (ii) blocks between one ancestral genome and a derived child genome, as above.

I hope this helped. Best regards, Joseph

Caoyu819 commented 1 year ago

Dear Joseph,

Thank you for your swift response, it's greatly appreciated.

Indeed, treating N30 as the last common ancestor of N30 and Oreh seems more reasonable. I'll certainly give it a try.

Just to be sure, by "synthetic" do you mean syntenic?

Yes, it's "syntenic" rather than "synthetic". I apologize for the misspelling.

Your explanation has certainly helped clarify my understanding. Thanks a lot.

Best regards, Yu