davidebolo1993 / VISOR

VarIant SimulatOR for short, long and linked reads
GNU Lesser General Public License v3.0
41 stars 11 forks source link

New form of variation (I think) - non-reciprocal homoeologous exchange #33

Open agolicz opened 1 year ago

agolicz commented 1 year ago

Hi Davide, This is really a feature request. We've been using VISOR and it has been great! We are working on a polyploid (allo-tetraploid) plant genome - Brassica napus. This means that it carries two sets of diploid, distinct, but closely related sub-genomes. It has a special type of variation - non-reciprocal homoeologous exchange, where several hundred of kbs to mbs from one sub-genome replace corresponding sequence in the other sub-genome, but the exchange is non-reciprocal, so the one of the sequences is completely lost. It would be incredibly helpful to us to be able to simulate this. If it helps we only hope to test the effect of this in completely inbred individuals - the sub-genomes are completely homozygous (as depicted on image attached). I know you are likely extremely busy, but if you could help that would be amazing! All the best, Agnieszka

Screenshot 2023-02-03 142511
davidebolo1993 commented 9 months ago

Hi @agolicz,

I know this has been a while but I'm trying to follow up on VISOR's issues and try to have them fixed by the end of the year. I'm not sure I'll be able to but I'm trying. Is this still of interest ? Did you find a way through ?

Best,

Davide

agolicz commented 9 months ago

Hi, no problem. I was actually just using VISOR yesterday :))). Yes, that would still be very much of interest! All the best, Agnieszka

davidebolo1993 commented 9 months ago

Looking at panel 2/3 above, isn't this something that can be somehow simulated with the current settings?

In principle, this is like reciprocal translocation between 2 chromosomes but you don't have loss of material on the first (panel 2) or the second (panel 3). I think this can be achieved by simulating a reciprocal translocation and replacing one of the haplotypes (the one that should remain un-touched) downstream. Something like:

  1. You have 2 different references - you make sure chromosomes from the 2 genomes have different names or you make them unique and then concatenate the references in a single one. In the end chr1 from genome 1 will be something like >chr1_genome1 and chr1 from genome 2 something like >chr1_genome2.
  2. You simulate a reciprocal translocation between the 2 chrosomes: chr1_genome1, haplotype 1 and chr1_genome2, haplotype 2. This will generate 2 haplotypes: >chr1_genome1 in h1.fa changed, >chr1_genome1 in h2.fa un-changed, >chr1_genome2 in h1.fa un-changed and >chr1_genome2 in h2.fa changed.
  3. For each haplotype, you extract genome-specific chromsomes so that you end up having h1.g1.fa, h1.g2.fa, h2.g1.fa, h2.g2.fa
  4. For the figure in panel 2 you keep h1.g2.fa (un-changed) and h2.g2.fa (changed) and you create a copy of each. You can rename the 4 then to h1,h2,h3,h4.fa

This in principle should work and can be used for read-level simulation with SHORtS/LASeR. Does this make sense to you? If there is something unclear I can work on an example.

agolicz commented 9 months ago

Hi Davide, Thanks! The scenario this does not account for I think is a single chromosome being both a donor and recipient, but these are two independent events, so think of panels two and three as just sections of the same chromosome. Usually we have multiple of those both ways on a single chromosome (meiosis in polyploids can be strange). I might be wrong though, need more coffee... All the best, Agnieszka