DyogenIBENS / Agora

Algorithm For Gene Order Reconstruction in Ancestors
Other
70 stars 15 forks source link

Plotting agora results plus a general question #18

Closed nousiaso closed 1 year ago

nousiaso commented 1 year ago

Hello, very nice and useful pipeline. I finished a run using orthogroups, I am very insterested to know how you approach the plotting of the ancestral genome outputs.

Thank you again, Orestis

nousiaso commented 1 year ago

When I say plot, i mean like the nice chromosome plot from genomicus web server for instance

alouis72 commented 1 year ago

Hello Orestis, thank you for using AGORA and Genomicus. At the present time, we do not provide tool to generate genomicus like multikaryoview with command line.

However, you can use src/misc.compareGenomes.py to generate the karyotypes one by one (with the option mode=drawKaryotype).

The first ancestral karyotype (one color per ancestral block) can be generate by comparing the genome to itself, eg: src/misc.compareGenomes.py ancGenome.A1.list.bz2 ancGenome.A1.list.bz2 ancGenes.A1.list.bz2 -mode=drawKaryotype > A1.ps

The others, by comparing the genome (ancestral or modern) with the first ancestral one: src/misc.compareGenomes.py genome.M1.list.bz2 ancGenome.A1.list.bz2 ancGenes.A1.list.bz2 -mode=drawKaryotype > M1.ps

etc... options are available (+sortBySize, -minChrSize etc...)

I hope this allows you to do what you want on your data, don't hesitate if you have other questions

Best wishes, Alexandra

nousiaso commented 1 year ago

Hey Alexandra thanks for the reply. It is in the right direction. I will try it. Wanted to ask you something else related to the analysis. I have 5 gene lists when i do the analysis, gene.genome1.list ...etc and one ancestral gene list gene.A0.list . From my understanding i compiled a file with the orthogroup ids for the ancestral genome, space separated as you show in the example. So i have one file called orthologyGroups.A0.list , but when i run the pipeline it needs 4 more files. Plus there is a strange thing where it needs them having these headers orthologyGroups.NAME_0.list , orthologyGroups.NAME_1.list ...orthologyGroups.NAME_4list . I just copied the initial file i had with all the orthology relationships orthologyGroups.A0.list and gave it all these names. It worked but I looked at the results and they are little baffling.

Then i tried also to get different inputs : i did this : the file orthologyGroups.NAME_0.list had just the orthologies of the A0 and genome 1, the orthologyGroups.NAME_1.list had just the orthologies of A0 and genome 2 etc...

After this try, i did not get ancestral genomes at the end. So I want to ask you what is going on exactly ? What are these 5 orthologyGroup.lists that are needed, and why not just one orthologyGroup list with all the orthologs between the A0 and the 5 genomes is not good enough ?

Please elaborate Thanks!!

biogeeker commented 1 year ago

Hi, Alexandra,

AGORA is a wonderful tools for reconstructing ancestor karyotype, I have two questions when I read the manual and NEE paper. As the example data shows: example

I have constructed the ancestor A1 karyotype as you mentioned above, then I want to know the karyotype of M1. Should I directly construct from A1 -> M1, or firstly to A3, and then from A3 -> M1? I think two steps is more reasonable, but still not sure. Here is my code :

  1. Directly construct from A1:
    ./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/size-1.0-1.0/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> M1.ps
  2. A1 -> A3 -> M1
    ./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A3.list.bz2 ../example/results/ancGenes/size-1.0-1.0/ancGenes.A3.list.bz2 -mode=drawKaryotype -minChrSize=200> M1_A3.ps

And another question is:

Can [ Agora ] output the occurrence of fusion and fission events for each ancestral node?

Thanks for your reply!

alouis72 commented 1 year ago

@nousiaso I think (tell me if I'm I'm wrong) you provided a species tree without naming the internal nodes, so AGORA named them NAME_1, NAME_2 etc... when parsing the species tree, (it is needed to build the comparisons computing process). Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups, because not all genome pairwise comparison are informative to reconstruct the ancestral root. (see figure2 of the article). Also each internal node Orthogroups should only contain genes of descendant species of this ancestor. I think one top orthologous group will not provide information about where gene duplications nor losses have occured during evolution.

alouis72 commented 1 year ago

@biogeeker just to be clear, misc.compareGenomes.py will not compute or build anything... It's just a tool to draw the modern or ancestral karyoptypes according to an ancestral or modern reference (or draw dotplots... see default mode=drawMatrix).

If you want to represent M1 with the color of the blocs of reconstructed A1:

1. plot A1 to draw the reference colors (1 bloc / 1 colour):

./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200 +sortBySize > A1.ps

2. plot M1 according to A1 colors of blocs:

./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> M1.ps

If you want to represent how it evolves from A1 to A3 to M1, you have to draw A3 according to A1 colours:

./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A3.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> A3.ps

Can [ Agora ] output the occurrence of fusion and fission events for each ancestral node? you can have an idea of of the fission and or fusion by using the same tool but with the option mode=printOrthologousChrom

as an example: src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200

will give:

CAR_1   Chr7 (516)
CAR_10  Chr12 (236) Chr25 (89)
CAR_11  Chr24 (194) Chr16 (95)
CAR_12  Chr17 (253)
CAR_13  Chr12 (202) Chr21 (41)
CAR_14  Chr30 (232)
CAR_15  Chr16 (234)
CAR_16  Chr27 (214)
CAR_17  Chr17 (215)
CAR_18  Chr21 (136) Chr12 (70)
CAR_19  Chr18 (155) Chr21 (130) Chr27 (64)
CAR_2   Chr1 (349)  Chr17 (120)
CAR_20  Chr10 (185) Chr16 (21)
CAR_21  Chr2 (208)  Chr3 (62)
CAR_22  Chr19 (174) Chr3 (34)

with this parameters (taking into account only blocs > 200 genes and with default parameter orthoschr:minHomology=90) that means that CAR_10 of A1 is split on 2 Chromosomes in M1

the reverse command: src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200 +reverse

Chr1    CAR_2 (349)
Chr10   CAR_20 (185)    CAR_6 (23)
Chr11   CAR_23 (202)
Chr12   CAR_6 (365) CAR_7 (325) CAR_10 (236)    CAR_13 (202)
Chr14   CAR_5 (355)
Chr16   CAR_4 (436) CAR_15 (234)    CAR_11 (95)
Chr17   CAR_12 (253)    CAR_17 (215)    CAR_2 (120)
Chr18   CAR_19 (155)
Chr19   CAR_22 (174)
Chr2    CAR_21 (208)
Chr20   CAR_8 (311)
Chr21   CAR_25 (196)    CAR_18 (136)    CAR_19 (130)
Chr22
Chr23   CAR_9 (294)
Chr24   CAR_11 (194)
Chr25   CAR_10 (89)
Chr27   CAR_16 (214)    CAR_19 (64)
Chr29   CAR_15 (6)  CAR_16 (3)
Chr3    CAR_21 (62) CAR_22 (34)
Chr30   CAR_14 (232)    CAR_24 (205)
Chr4
Chr6    CAR_3 (447)
Chr7    CAR_1 (516)

will informe you that the modern Chr12 is composed of 4 ancestral blocs (complete or partial)

different options are available :

-mode=printOrthologuesCount will tell you how many genes from which blocs composed the target genome chromosomes.

    CAR_1   CAR_10  CAR_11  CAR_12  CAR_13  CAR_14  CAR_15  CAR_16  CAR_17  CAR_18  CAR_19  CAR_2   CAR_20  CAR_21  CAR_22  CAR_23  CAR_24  CAR_25  CAR_3   CAR_4   CAR_5   CAR_6   CAR_7   CAR_8   CAR_9
Chr1    0   0   0   0   0   0   0   0   0   0   0   349 0   0   0   0   0   0   0   0   0   0   0   0   0
Chr10   0   0   0   0   0   0   0   0   0   0   0   0   185 0   0   0   0   0   0   0   0   23  0   0   0
Chr11   0   0   0   0   0   0   0   0   3   0   0   0   0   0   0   202 0   0   0   0   0   0   0   0   0
Chr12   0   236 0   0   202 0   0   0   0   70  0   0   0   0   0   0   0   0   0   0   0   365 325 0   0

-mode=printOrthologuesList will list the orthologous genes.

I hope this would help. Cheers, Alex

nousiaso commented 1 year ago

@nousiaso I think (tell me if I'm I'm wrong) you provided a species tree without naming the internal nodes, so AGORA named them NAME_1, NAME_2 etc... when parsing the species tree, (it is needed to build the comparisons computing process). Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups, because not all genome pairwise comparison are informative to reconstruct the ancestral root. (see figure2 of the article). Also each internal node Orthogroups should only contain genes of descendant species of this ancestor. I think one top orthologous group will not provide information about where gene duplications nor losses have occured during evolution.

Hello, Alexandra, Thanks for the reply

I named the species tree according to the names in gene.*.list.bz2 .

Concerning this : "Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups"

I thought that would be the case as you describe it.

Let me make it crystal clear

We have the 5 genomes of the cultivars we are investigating. I want to reconstruct the common ancestor of the 5 genomes plus A0. A0 is a closely related wild genome of different genus (but same family), that diverged from the genus we are investigating.

We have the orthogroups of that A0 with the 5 genomes. What i did is i took the orthologs from A0 with genome 1 , A0 with genome 2 etc, and merged them into a file with unique A0 gene ids using pandas.

At the end we have a file that has all orthologs of A0 with the five genomes (like the one in the example)

I used that for the orthologyGroup.A0.list as input but it was not enough. It needed files that were named orthologyGroup.NAME_0.list orthologyGroup.NAME_1.list orthologyGroup.NAME_2.list orthologyGroup.NAME_3.list orthologyGroup.NAME_4.list

i copied the merged orthologyfile orthologyGroup.A0.list , using the different names NAME_0, until NAME_4 and it worked.

After that i also tested the following used the un-merged orthologyGroup files A0 with genome1 ... etc A0 genome 5 and named them orthologyGroup.NAME_0.list,, etc until orthologyGroup.NAME_4.list ( looking at ancgenes.log) i used the same order according to the gene.*.list.bz2 parsed, the first list was NAME_0 ... the fifth list was NAME_4)

i did not get ancestral genome results after that.

So i guess this is why : "Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups"

this is not a problem. I understand why. The NAME issue though i did not really understand why it happens.

If you have an insight on that, please share, thank you much for the repplies! Orestis

alouis72 commented 1 year ago

@nousiaso If your speciesTree is like:

((((spec1,spec2),spec3),spec4),spec5)

image

with no name at inner ancestral node, AGORA will infer them as "NAME_0" "NAME_1" "NAME_2" etc... and will search for data with that infered name (Orthologous.NAME_1.list, etc...)

If you provide a speciestree with name at inner nodes: (((Spec1,Spec2)A3,Spec3)A1,(Spec4,Spec5)A2)A0 ;

image

AGORA will expected OrthoGroups.A[0-3].list and will not use NAME_ cause the algorithm doesn't need to infer the name of ancestors.

If you need to show me data more privately you can email me to alouis@bio.ens.psl.eu

nousiaso commented 1 year ago

Your reply was very clear !

Thanks a lot Alexandra ! Best, Orestis

Caoyu819 commented 1 year ago

@biogeeker just to be clear, misc.compareGenomes.py will not compute or build anything... It's just a tool to draw the modern or ancestral karyoptypes according to an ancestral or modern reference (or draw dotplots... see default mode=drawMatrix).

If you want to represent M1 with the color of the blocs of reconstructed A1:

1. plot A1 to draw the reference colors (1 bloc / 1 colour):

./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200 +sortBySize > A1.ps

2. plot M1 according to A1 colors of blocs:

./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> M1.ps

If you want to represent how it evolves from A1 to A3 to M1, you have to draw A3 according to A1 colours:

./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A3.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> A3.ps

Can [ Agora ] output the occurrence of fusion and fission events for each ancestral node? you can have an idea of of the fission and or fusion by using the same tool but with the option mode=printOrthologousChrom

as an example: src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200

will give:

CAR_1 Chr7 (516)
CAR_10    Chr12 (236) Chr25 (89)
CAR_11    Chr24 (194) Chr16 (95)
CAR_12    Chr17 (253)
CAR_13    Chr12 (202) Chr21 (41)
CAR_14    Chr30 (232)
CAR_15    Chr16 (234)
CAR_16    Chr27 (214)
CAR_17    Chr17 (215)
CAR_18    Chr21 (136) Chr12 (70)
CAR_19    Chr18 (155) Chr21 (130) Chr27 (64)
CAR_2 Chr1 (349)  Chr17 (120)
CAR_20    Chr10 (185) Chr16 (21)
CAR_21    Chr2 (208)  Chr3 (62)
CAR_22    Chr19 (174) Chr3 (34)

with this parameters (taking into account only blocs > 200 genes and with default parameter orthoschr:minHomology=90) that means that CAR_10 of A1 is split on 2 Chromosomes in M1

the reverse command: src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200 +reverse

Chr1  CAR_2 (349)
Chr10 CAR_20 (185)    CAR_6 (23)
Chr11 CAR_23 (202)
Chr12 CAR_6 (365) CAR_7 (325) CAR_10 (236)    CAR_13 (202)
Chr14 CAR_5 (355)
Chr16 CAR_4 (436) CAR_15 (234)    CAR_11 (95)
Chr17 CAR_12 (253)    CAR_17 (215)    CAR_2 (120)
Chr18 CAR_19 (155)
Chr19 CAR_22 (174)
Chr2  CAR_21 (208)
Chr20 CAR_8 (311)
Chr21 CAR_25 (196)    CAR_18 (136)    CAR_19 (130)
Chr22
Chr23 CAR_9 (294)
Chr24 CAR_11 (194)
Chr25 CAR_10 (89)
Chr27 CAR_16 (214)    CAR_19 (64)
Chr29 CAR_15 (6)  CAR_16 (3)
Chr3  CAR_21 (62) CAR_22 (34)
Chr30 CAR_14 (232)    CAR_24 (205)
Chr4
Chr6  CAR_3 (447)
Chr7  CAR_1 (516)

will informe you that the modern Chr12 is composed of 4 ancestral blocs (complete or partial)

different options are available :

-mode=printOrthologuesCount will tell you how many genes from which blocs composed the target genome chromosomes.

  CAR_1   CAR_10  CAR_11  CAR_12  CAR_13  CAR_14  CAR_15  CAR_16  CAR_17  CAR_18  CAR_19  CAR_2   CAR_20  CAR_21  CAR_22  CAR_23  CAR_24  CAR_25  CAR_3   CAR_4   CAR_5   CAR_6   CAR_7   CAR_8   CAR_9
Chr1  0   0   0   0   0   0   0   0   0   0   0   349 0   0   0   0   0   0   0   0   0   0   0   0   0
Chr10 0   0   0   0   0   0   0   0   0   0   0   0   185 0   0   0   0   0   0   0   0   23  0   0   0
Chr11 0   0   0   0   0   0   0   0   3   0   0   0   0   0   0   202 0   0   0   0   0   0   0   0   0
Chr12 0   236 0   0   202 0   0   0   0   70  0   0   0   0   0   0   0   0   0   0   0   365 325 0   0

-mode=printOrthologuesList will list the orthologous genes.

I hope this would help. Cheers, Alex

Hi Alexandra,

Thank you for developing such a great tool for karyotype reconstruction. I ran the AGORA, and obtained the ancestral genomes for the target nodes.

Now I'm attempting to plot the karyotype using the AGORA package's script "misc.compareGenomes.py," but I always get the same error when I execute the commandline, even with the example data: ./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200 +sortBySize > A1.ps.

The error is shown below: Affichage Traceback (most recent call last): File "/home/yucao/software/Agora-master/src/misc.compareGenomes.py", line 268, in eval(str(arguments["mode"]))() File "/home/yucao/software/Agora-master/src/misc.compareGenomes.py", line 85, in drawMatrix utils.myPsOutput.printPsHeader() File "/home/yucao/software/Agora-master/src/utils/myPsOutput.py", line 24, in printPsHeader initColor() File "/home/yucao/software/Agora-master/src/utils/myPsOutput.py", line 54, in initColor location = [f for f in knownLocations if os.path.exists(f)][0] IndexError: list index out of range

And the same error shows for my own data and results, do you have any idea why this occurs? According to the error message, I located line 54 of the script “/Agora-master/src/utils/myPsOutput.py” (please see the picture below for the details), it seems the error occurred because the missing of file "rgb.txt", but I'm still not sure how to get the file "rgb.txt"?

Thank you very much! looking very forward to your reply.

Best, Yu

Screen Shot 2023-02-23 at 16 17 21
DyogenIBENS commented 1 year ago

Hi Yu, sorry for the later answer, this file (rgb.txt), contains the definition of all the colors that will be used in the karyotypes output. These are usual locations on linux ubuntu or other linux distribution. On your computer or server, you can probably execute:

locate rgb.txt to find the good location and change it in the /Agora-master/src/utils/myPsOutput.py

I already update the code with one other location "/usr/share/X11/rgb.txt"

I hope this helps, best, Alexandra

Caoyu819 commented 1 year ago

Dear Alexandra,

Thank you very much for your reply, after update the code with location "/usr/share/X11/rgb.txt", the script "misc.compareGenomes.py" run successfully, cheers!

Thanks again for your help. Best, Yu