Closed nousiaso closed 1 year ago
When I say plot, i mean like the nice chromosome plot from genomicus web server for instance
Hello Orestis, thank you for using AGORA and Genomicus. At the present time, we do not provide tool to generate genomicus like multikaryoview with command line.
However, you can use src/misc.compareGenomes.py to generate the karyotypes one by one (with the option mode=drawKaryotype).
The first ancestral karyotype (one color per ancestral block) can be generate by comparing the genome to itself, eg: src/misc.compareGenomes.py ancGenome.A1.list.bz2 ancGenome.A1.list.bz2 ancGenes.A1.list.bz2 -mode=drawKaryotype > A1.ps
The others, by comparing the genome (ancestral or modern) with the first ancestral one: src/misc.compareGenomes.py genome.M1.list.bz2 ancGenome.A1.list.bz2 ancGenes.A1.list.bz2 -mode=drawKaryotype > M1.ps
etc... options are available (+sortBySize, -minChrSize etc...)
I hope this allows you to do what you want on your data, don't hesitate if you have other questions
Best wishes, Alexandra
Hey Alexandra thanks for the reply. It is in the right direction. I will try it. Wanted to ask you something else related to the analysis. I have 5 gene lists when i do the analysis, gene.genome1.list ...etc and one ancestral gene list gene.A0.list . From my understanding i compiled a file with the orthogroup ids for the ancestral genome, space separated as you show in the example. So i have one file called orthologyGroups.A0.list , but when i run the pipeline it needs 4 more files. Plus there is a strange thing where it needs them having these headers orthologyGroups.NAME_0.list , orthologyGroups.NAME_1.list ...orthologyGroups.NAME_4list . I just copied the initial file i had with all the orthology relationships orthologyGroups.A0.list and gave it all these names. It worked but I looked at the results and they are little baffling.
Then i tried also to get different inputs : i did this : the file orthologyGroups.NAME_0.list had just the orthologies of the A0 and genome 1, the orthologyGroups.NAME_1.list had just the orthologies of A0 and genome 2 etc...
After this try, i did not get ancestral genomes at the end. So I want to ask you what is going on exactly ? What are these 5 orthologyGroup.lists that are needed, and why not just one orthologyGroup list with all the orthologs between the A0 and the 5 genomes is not good enough ?
Please elaborate Thanks!!
Hi, Alexandra,
AGORA is a wonderful tools for reconstructing ancestor karyotype, I have two questions when I read the manual and NEE paper. As the example data shows:
I have constructed the ancestor A1 karyotype as you mentioned above, then I want to know the karyotype of M1. Should I directly construct from A1 -> M1, or firstly to A3, and then from A3 -> M1? I think two steps is more reasonable, but still not sure. Here is my code :
./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/size-1.0-1.0/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> M1.ps
./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A3.list.bz2 ../example/results/ancGenes/size-1.0-1.0/ancGenes.A3.list.bz2 -mode=drawKaryotype -minChrSize=200> M1_A3.ps
And another question is:
Can [ Agora ] output the occurrence of fusion and fission events for each ancestral node?
Thanks for your reply!
@nousiaso I think (tell me if I'm I'm wrong) you provided a species tree without naming the internal nodes, so AGORA named them NAME_1, NAME_2 etc... when parsing the species tree, (it is needed to build the comparisons computing process). Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups, because not all genome pairwise comparison are informative to reconstruct the ancestral root. (see figure2 of the article). Also each internal node Orthogroups should only contain genes of descendant species of this ancestor. I think one top orthologous group will not provide information about where gene duplications nor losses have occured during evolution.
@biogeeker just to be clear, misc.compareGenomes.py will not compute or build anything... It's just a tool to draw the modern or ancestral karyoptypes according to an ancestral or modern reference (or draw dotplots... see default mode=drawMatrix).
If you want to represent M1 with the color of the blocs of reconstructed A1:
1. plot A1 to draw the reference colors (1 bloc / 1 colour):
./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200 +sortBySize > A1.ps
2. plot M1 according to A1 colors of blocs:
./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> M1.ps
If you want to represent how it evolves from A1 to A3 to M1, you have to draw A3 according to A1 colours:
./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A3.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> A3.ps
Can [ Agora ] output the occurrence of fusion and fission events for each ancestral node? you can have an idea of of the fission and or fusion by using the same tool but with the option mode=printOrthologousChrom
as an example:
src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200
will give:
CAR_1 Chr7 (516)
CAR_10 Chr12 (236) Chr25 (89)
CAR_11 Chr24 (194) Chr16 (95)
CAR_12 Chr17 (253)
CAR_13 Chr12 (202) Chr21 (41)
CAR_14 Chr30 (232)
CAR_15 Chr16 (234)
CAR_16 Chr27 (214)
CAR_17 Chr17 (215)
CAR_18 Chr21 (136) Chr12 (70)
CAR_19 Chr18 (155) Chr21 (130) Chr27 (64)
CAR_2 Chr1 (349) Chr17 (120)
CAR_20 Chr10 (185) Chr16 (21)
CAR_21 Chr2 (208) Chr3 (62)
CAR_22 Chr19 (174) Chr3 (34)
with this parameters (taking into account only blocs > 200 genes and with default parameter orthoschr:minHomology=90) that means that CAR_10 of A1 is split on 2 Chromosomes in M1
the reverse command:
src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200 +reverse
Chr1 CAR_2 (349)
Chr10 CAR_20 (185) CAR_6 (23)
Chr11 CAR_23 (202)
Chr12 CAR_6 (365) CAR_7 (325) CAR_10 (236) CAR_13 (202)
Chr14 CAR_5 (355)
Chr16 CAR_4 (436) CAR_15 (234) CAR_11 (95)
Chr17 CAR_12 (253) CAR_17 (215) CAR_2 (120)
Chr18 CAR_19 (155)
Chr19 CAR_22 (174)
Chr2 CAR_21 (208)
Chr20 CAR_8 (311)
Chr21 CAR_25 (196) CAR_18 (136) CAR_19 (130)
Chr22
Chr23 CAR_9 (294)
Chr24 CAR_11 (194)
Chr25 CAR_10 (89)
Chr27 CAR_16 (214) CAR_19 (64)
Chr29 CAR_15 (6) CAR_16 (3)
Chr3 CAR_21 (62) CAR_22 (34)
Chr30 CAR_14 (232) CAR_24 (205)
Chr4
Chr6 CAR_3 (447)
Chr7 CAR_1 (516)
will informe you that the modern Chr12 is composed of 4 ancestral blocs (complete or partial)
different options are available :
-mode=printOrthologuesCount will tell you how many genes from which blocs composed the target genome chromosomes.
CAR_1 CAR_10 CAR_11 CAR_12 CAR_13 CAR_14 CAR_15 CAR_16 CAR_17 CAR_18 CAR_19 CAR_2 CAR_20 CAR_21 CAR_22 CAR_23 CAR_24 CAR_25 CAR_3 CAR_4 CAR_5 CAR_6 CAR_7 CAR_8 CAR_9
Chr1 0 0 0 0 0 0 0 0 0 0 0 349 0 0 0 0 0 0 0 0 0 0 0 0 0
Chr10 0 0 0 0 0 0 0 0 0 0 0 0 185 0 0 0 0 0 0 0 0 23 0 0 0
Chr11 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 202 0 0 0 0 0 0 0 0 0
Chr12 0 236 0 0 202 0 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 365 325 0 0
-mode=printOrthologuesList will list the orthologous genes.
I hope this would help. Cheers, Alex
@nousiaso I think (tell me if I'm I'm wrong) you provided a species tree without naming the internal nodes, so AGORA named them NAME_1, NAME_2 etc... when parsing the species tree, (it is needed to build the comparisons computing process). Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups, because not all genome pairwise comparison are informative to reconstruct the ancestral root. (see figure2 of the article). Also each internal node Orthogroups should only contain genes of descendant species of this ancestor. I think one top orthologous group will not provide information about where gene duplications nor losses have occured during evolution.
Hello, Alexandra, Thanks for the reply
I named the species tree according to the names in gene.*.list.bz2 .
Concerning this : "Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups"
I thought that would be the case as you describe it.
Let me make it crystal clear
We have the 5 genomes of the cultivars we are investigating. I want to reconstruct the common ancestor of the 5 genomes plus A0. A0 is a closely related wild genome of different genus (but same family), that diverged from the genus we are investigating.
We have the orthogroups of that A0 with the 5 genomes. What i did is i took the orthologs from A0 with genome 1 , A0 with genome 2 etc, and merged them into a file with unique A0 gene ids using pandas.
At the end we have a file that has all orthologs of A0 with the five genomes (like the one in the example)
I used that for the orthologyGroup.A0.list as input but it was not enough. It needed files that were named orthologyGroup.NAME_0.list orthologyGroup.NAME_1.list orthologyGroup.NAME_2.list orthologyGroup.NAME_3.list orthologyGroup.NAME_4.list
i copied the merged orthologyfile orthologyGroup.A0.list , using the different names NAME_0, until NAME_4 and it worked.
After that i also tested the following used the un-merged orthologyGroup files A0 with genome1 ... etc A0 genome 5 and named them orthologyGroup.NAME_0.list,, etc until orthologyGroup.NAME_4.list ( looking at ancgenes.log) i used the same order according to the gene.*.list.bz2 parsed, the first list was NAME_0 ... the fifth list was NAME_4)
i did not get ancestral genome results after that.
So i guess this is why : "Whether you want de build each ancestral node of the species tree or only the ancestral root, AGORA needs all internal Orthologous groups"
this is not a problem. I understand why. The NAME issue though i did not really understand why it happens.
If you have an insight on that, please share, thank you much for the repplies! Orestis
@nousiaso If your speciesTree is like:
((((spec1,spec2),spec3),spec4),spec5)
with no name at inner ancestral node, AGORA will infer them as "NAME_0" "NAME_1" "NAME_2" etc... and will search for data with that infered name (Orthologous.NAME_1.list, etc...)
If you provide a speciestree with name at inner nodes: (((Spec1,Spec2)A3,Spec3)A1,(Spec4,Spec5)A2)A0 ;
AGORA will expected OrthoGroups.A[0-3].list and will not use NAME_ cause the algorithm doesn't need to infer the name of ancestors.
If you need to show me data more privately you can email me to alouis@bio.ens.psl.eu
Your reply was very clear !
Thanks a lot Alexandra ! Best, Orestis
@biogeeker just to be clear, misc.compareGenomes.py will not compute or build anything... It's just a tool to draw the modern or ancestral karyoptypes according to an ancestral or modern reference (or draw dotplots... see default mode=drawMatrix).
If you want to represent M1 with the color of the blocs of reconstructed A1:
1. plot A1 to draw the reference colors (1 bloc / 1 colour):
./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200 +sortBySize > A1.ps
2. plot M1 according to A1 colors of blocs:
./misc.compareGenomes.py ../example/data/genes/genes.M1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> M1.ps
If you want to represent how it evolves from A1 to A3 to M1, you have to draw A3 according to A1 colours:
./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A3.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200> A3.ps
Can [ Agora ] output the occurrence of fusion and fission events for each ancestral node? you can have an idea of of the fission and or fusion by using the same tool but with the option mode=printOrthologousChrom
as an example:
src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200
will give:
CAR_1 Chr7 (516) CAR_10 Chr12 (236) Chr25 (89) CAR_11 Chr24 (194) Chr16 (95) CAR_12 Chr17 (253) CAR_13 Chr12 (202) Chr21 (41) CAR_14 Chr30 (232) CAR_15 Chr16 (234) CAR_16 Chr27 (214) CAR_17 Chr17 (215) CAR_18 Chr21 (136) Chr12 (70) CAR_19 Chr18 (155) Chr21 (130) Chr27 (64) CAR_2 Chr1 (349) Chr17 (120) CAR_20 Chr10 (185) Chr16 (21) CAR_21 Chr2 (208) Chr3 (62) CAR_22 Chr19 (174) Chr3 (34)
with this parameters (taking into account only blocs > 200 genes and with default parameter orthoschr:minHomology=90) that means that CAR_10 of A1 is split on 2 Chromosomes in M1
the reverse command:
src/misc.compareGenomes.py ./example/results/ancGenomes/vertebrates-workflow/ancGenome.A1.list.bz2 example/data/genes/genes.M1.list.bz2 example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=printOrthologousChrom +sortBySize -minChrSize=200 +reverse
Chr1 CAR_2 (349) Chr10 CAR_20 (185) CAR_6 (23) Chr11 CAR_23 (202) Chr12 CAR_6 (365) CAR_7 (325) CAR_10 (236) CAR_13 (202) Chr14 CAR_5 (355) Chr16 CAR_4 (436) CAR_15 (234) CAR_11 (95) Chr17 CAR_12 (253) CAR_17 (215) CAR_2 (120) Chr18 CAR_19 (155) Chr19 CAR_22 (174) Chr2 CAR_21 (208) Chr20 CAR_8 (311) Chr21 CAR_25 (196) CAR_18 (136) CAR_19 (130) Chr22 Chr23 CAR_9 (294) Chr24 CAR_11 (194) Chr25 CAR_10 (89) Chr27 CAR_16 (214) CAR_19 (64) Chr29 CAR_15 (6) CAR_16 (3) Chr3 CAR_21 (62) CAR_22 (34) Chr30 CAR_14 (232) CAR_24 (205) Chr4 Chr6 CAR_3 (447) Chr7 CAR_1 (516)
will informe you that the modern Chr12 is composed of 4 ancestral blocs (complete or partial)
different options are available :
-mode=printOrthologuesCount will tell you how many genes from which blocs composed the target genome chromosomes.
CAR_1 CAR_10 CAR_11 CAR_12 CAR_13 CAR_14 CAR_15 CAR_16 CAR_17 CAR_18 CAR_19 CAR_2 CAR_20 CAR_21 CAR_22 CAR_23 CAR_24 CAR_25 CAR_3 CAR_4 CAR_5 CAR_6 CAR_7 CAR_8 CAR_9 Chr1 0 0 0 0 0 0 0 0 0 0 0 349 0 0 0 0 0 0 0 0 0 0 0 0 0 Chr10 0 0 0 0 0 0 0 0 0 0 0 0 185 0 0 0 0 0 0 0 0 23 0 0 0 Chr11 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 202 0 0 0 0 0 0 0 0 0 Chr12 0 236 0 0 202 0 0 0 0 70 0 0 0 0 0 0 0 0 0 0 0 365 325 0 0
-mode=printOrthologuesList will list the orthologous genes.
I hope this would help. Cheers, Alex
Hi Alexandra,
Thank you for developing such a great tool for karyotype reconstruction. I ran the AGORA, and obtained the ancestral genomes for the target nodes.
Now I'm attempting to plot the karyotype using the AGORA package's script "misc.compareGenomes.py," but I always get the same error when I execute the commandline, even with the example data:
./misc.compareGenomes.py ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenomes/basic-workflow/ancGenome.A1.list.bz2 ../example/results/ancGenes/all/ancGenes.A1.list.bz2 -mode=drawKaryotype -minChrSize=200 +sortBySize > A1.ps
.
The error is shown below:
Affichage Traceback (most recent call last):
File "/home/yucao/software/Agora-master/src/misc.compareGenomes.py", line 268, in
And the same error shows for my own data and results, do you have any idea why this occurs? According to the error message, I located line 54 of the script “/Agora-master/src/utils/myPsOutput.py” (please see the picture below for the details), it seems the error occurred because the missing of file "rgb.txt", but I'm still not sure how to get the file "rgb.txt"?
Thank you very much! looking very forward to your reply.
Best, Yu
Hi Yu, sorry for the later answer, this file (rgb.txt), contains the definition of all the colors that will be used in the karyotypes output. These are usual locations on linux ubuntu or other linux distribution. On your computer or server, you can probably execute:
locate rgb.txt
to find the good location and change it in the /Agora-master/src/utils/myPsOutput.py
I already update the code with one other location "/usr/share/X11/rgb.txt"
I hope this helps, best, Alexandra
Dear Alexandra,
Thank you very much for your reply, after update the code with location "/usr/share/X11/rgb.txt", the script "misc.compareGenomes.py" run successfully, cheers!
Thanks again for your help. Best, Yu
Hello, very nice and useful pipeline. I finished a run using orthogroups, I am very insterested to know how you approach the plotting of the ancestral genome outputs.
Thank you again, Orestis