TickingClock1992 / RIdeogram

154 stars 23 forks source link

Ternary synteny : connector lines end up outside of chromosome regions. #7

Closed fohebert closed 4 years ago

fohebert commented 4 years ago

So I am comparing gene positions in 3 different genomes (different plant ecotypes of the same species). I would love to have the nice display of the 3 genomes in a pyramid-like graph with the lines connecting the 1:1 orthologs from species A to B and from A to C (A being the "best reference" among the 3 genomes).

I managed to produce the input files without any error and the figure nicely displays pretty much want I want, but it seems like all the connector lines have "shifted" or "skewed" end positions. I see 2 issues more specifically :

  1. Some lines beginning in species A end up in between 2 chromosomes in species B or C. How is that possible ? It's difficult to estimate where these shifted lines should end up (i.e. in which of the 2 chromosomes these lines should finish?).
  2. A possibly related issue is that multiple connector lines end up way beyond the end of the last chromosome in both species B and C. Could this be related to the fact that all the lines are shifted? Is it because I have too many 1:1 orthology relationships and the connector line density is too high to output all the lines at the right position? Here you have the image for reference:

synteny one_to_one ortho dec-2019 types1-2_only

I was wondering how this issue could be solved. I triple checked and all the positions in the synteny file are right. There is no start or end position in any of the genes that extends beyond the last position of the last chromosome. All the positions are correct, so I assume that it must be related to the way the lines are outputted in the package. But maybe I'm wrong... Could I play with some parameters to extend the little colored boxes that represent the chromosomes so that the line fits the good positions ?

I've attached my karyotype and synteny files for reference as well. Just in case it can help solving the problem.

Thanks so much for any help. I like that package, so I hope we can come up with a solution. csat.synteny_ternary.chrom-only.txt csat.karyotype_ternary.txt

TickingClock1992 commented 4 years ago

Hello, sorry for the delay. I checked the synteny file and found that there was one gene (Type 1) in B-PKv3 (Species 2) Chr10 was located in [104110927, 104120203] witch already exceed the range of the B-PKv3 Chr10 [1, 29404172]. And, there were more genes in a similar situation. Please make sure that all gene positions were located within the chromosomes.

TickingClock1992 commented 4 years ago

Here, I give a example in which all illogical values as described above in the synteny file were removed using following script.

library("RIdeogram")
karyotype <- read.table("csat.karyotype_ternary.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE)
synteny <- read.table("csat.synteny_ternary.chrom-only.txt", header = TRUE, sep = "\t", stringsAsFactors = FALSE, fill = TRUE)

for (i in 1:nrow(synteny)){
  if (synteny[i, 8] == 1){
    synteny[i, 9] = max(synteny[i, 3] - subset(karyotype, species == "A-CBDRx" & Chr == synteny[i, 1])[1, 3],
                        synteny[i, 6] - subset(karyotype, species == "B-PKv3" & Chr == synteny[i, 4])[1, 3])
  } else if (synteny[i, 8] == 2){
    synteny[i, 9] = max(synteny[i, 3] - subset(karyotype, species == "A-CBDRx" & Chr == synteny[i, 1])[1, 3],
                        synteny[i, 6] - subset(karyotype, species == "C-FNv2" & Chr == synteny[i, 4])[1, 3])
  } else if (synteny[i, 8] == 3){
    synteny[i, 9] = max(synteny[i, 3] - subset(karyotype, species == "B-PKv3" & Chr == synteny[i, 1])[1, 3],
                        synteny[i, 6] - subset(karyotype, species == "C-FNv2" & Chr == synteny[i, 4])[1, 3])
  }
}

synteny <- subset(synteny, V9 < 0)[,1:8]

ideogram(karyotype = karyotype, synteny = synteny)
library("rsvg")
rsvg_pdf("chromosome.svg", "chromosome.pdf")

And, you can get this plot image

I think you need to check and correct the synteny file.

fohebert commented 4 years ago

Dear TickingClock1992 :-)

Thanks so much for taking the time to answer me. I appreciate it ! Based on your answer and from what I understand of the structure of the "synteny" file, I think I might have inverted columns 2-3 and 5-6 (i.e. start and end positions of the orthologs in each species). I might have misunderstood the tutorial, but in my synteny file, columns 2-3 are the start/end position of the gene in species 2, while columns 5-6 are start/end of the corresponding gene in species 1. So the positions you mention in your answer [104,110,927-104,120,203] are not the positions for the gene on chromosome 10 (total length of 20,404,172 bp) of B-PKVv4 (species 2), but the positions of the gene on chromosome 1 (total length of 104,987,320 bp) in A-CBDRx (species 1).

I ran my R script from the beginning and used an "inverted" table (with columns 2-3 and 5-6 swapped, i.e. columns 2-3 become columns 5-6 and vice-versa), and it works fine. It gives me the graph that I was expecting. So my bad, I did not understand properly the structure of the input file. But it's all sorted out now. Thanks for your help and your time !

RIdeogram orthoFinder dec-2019

TickingClock1992 commented 4 years ago

It is very nice to know that it works for you.