eveilyeverafter / HMMancestry

R package using the Forward-Backward algorithm to infer genotypes, recombination hotspots, and gene conversion tracts from low-coverage next-generation sequence data.
1 stars 0 forks source link

simulating gene conversion #9

Closed eveilyeverafter closed 9 years ago

eveilyeverafter commented 9 years ago

The current form of the recombination simulation does not allow gene conversion to be modeled. Doing so many provide interesting patterns of gene conversion and loss of genetic diversity in nature and in the lab. There are a couple of different classes of gene conversions that we infer using our inference code (see infer-recom.R). These are: 1) GC_tel = those at the telomeres 2) NCO = a non-crossover where a gene conversion track is found but without a crossover event 3) COyesGC = a gene conversion event associated with a crossover event.

Types 2 and 3 are very similar biologically and so they should be similar to code (i.e., in the recombine() function). Type 1 is biologically different (?) but can be simulated fairly easy.

The proposed parameters in the simulation is 1) the rate of gene conversion and 2) the average length of a gene conversion track.

eveilyeverafter commented 9 years ago

Two and three are included in the update simulation. I still need extend the functionality to include telomeric gene conversions.

The current functions take on three additional parameters to make the simulator more flexible. The parameter f.cross specifies the frequency of crossovers (i.e., 1 - freq(non-crossovers)) during recombination. The parameter f.convert specifies the frequency of gene conversions during meiosis and these conversion can occur with either crossover or non-crossovers. Lastly, the parameter length.conversion specifies the mean (and variance) of the gene conversion tract (i.e., this is done by sampling from a poisson distribution of lambda=length.conversion.

Several things should be noted regarding this parameter expansion: 1) when mu.rate, f.cross, and f.convert are all set to zero then recombination only occurs between sister chromatids and, though the algorithm is still computing recombination at all the r.index points, you will not see any evidence of recombination. 2) Non-crossovers can actually occur any of three ways: a) one of the alternate ways of resolving Holliday junctions (the other results in a crossover event) b) recombination between sister chromatids. The latter scenario is easily seen when mutations are present to distinguish between sister chromatids. In the simulation code, two chromatids are sampled at random between the four. In some cases you get recombination between chromatids that are identical by descent. In other cases you get scenario b above. When f.convert>0 there is a probability that gene conversion will occur. The simulation picks at random from the two remaining chromatids to use as a template for gene conversion. The result is identical to that of a non-crossover. It should not be surprising if the inference tracking function --infer_tracts() -- calls "NCO"s even when f.cross, the frequency of crossovers, is set to 1. c) mutations during simulated meiosis create situations that look like 1 base pair non-crossover events. Running the forward-backward algorithm may "smooth" out this noise.

eveilyeverafter commented 9 years ago

For now I'm not sure of a biological mechanism for telomeric gene conversion so I'm not sure the best way to simulate it. I welcome any suggestions out there.