DyogenIBENS / Agora

Algorithm For Gene Order Reconstruction in Ancestors
Other
70 stars 15 forks source link

How to locate the support scores of the AGORA adjacency graph for adjacent genes in ancGenome? #27

Open Caoyu819 opened 1 year ago

Caoyu819 commented 1 year ago

Dear Agora Team,

I extend my gratitude to you for developing this exceptional tool for reconstructing ancestral genomes. Recently, I've encountered some challenges in accurately identifying breakpoints between successive genomes within a phylogeny.

I followed the provided guidelines for breakpoint identification and filtering, resulting in a list of candidate breakpoints. However, upon closer inspection, I noticed a number of false positive results. My suspicion is that I might not have properly applied the criteria involving "ends-of-blocks located in ancestral gene adjacencies that are not or poorly supported in the AGORA adjacency graph." Unfortunately, I'm unsure about the specific result file containing the AGORA adjacency graph support scores for adjacent genes in the ancGenome. Could it be the 'pairwise/pairs-all/N%s.list.bz2' files?

As an example, for the ancestral node 'N4', I identified the adjacent genes "N4.6978-N4.25970-N4.29043" in the ancGenome "ancGenome.N4.list.bz2". I attempted to find the AGORA adjacency graph support scores for both "N4.6978-N4.25970" and "N4.25970-N4.29043" using the following shell command: bzcat N4.list.bz2 | grep -w "25970|6978|29043" The output is shown below:

Screen Shot 2023-08-23 at 19 22 04

Does this output imply that both adjacent gene pairs, "N4.25970-N4.29043" and "N4.6978-N4.25970", lack support in the AGORA adjacency graph? Is the fifth column in the "N4.list.bz2" file indicative of the AGORA adjacency graph support score for a specific gene (family) pair?

I am grappling with this issue due to the detection of "N4.6978-N4.25970" as an interchromosomal rearrangement breakpoint in the ancestral genome N4, when compared to its descendant node. However, this breakpoint appears questionable, as indicated by checking the "ancGenome.N4.list.bz2" file using the command: bzcat ancGenome.N4.list.bz2 | grep -A 1 "N4.25970|N4.6978" The output is shown below:

Screen Shot 2023-08-23 at 19 31 47

It appears that the gene family of N4.25970 erroneously includes the gene 'Cmol.Cmol1g01662', resulting in the connection of two blocks, one with "Cmol.Cmol1g01661" and another with "Cmol.Cmol7g00753" during the fill-in/fusion/insertion steps. Could I be misunderstanding this situation?

I apologize for the length of this inquiry. Your response would be immensely appreciated. Any suggestions or clarifications you can provide will be of great assistance.

Thank you very much in advance.

With best regards, Yu