Darokrithia / PopGenSim

Simulates polygenic evolution of the polygenic trait of digital genomes aka degnomes
MIT License
2 stars 1 forks source link

Average number of crossovers is (slightly) lower than it should be #57

Open Darokrithia opened 3 years ago

Darokrithia commented 3 years ago

A random number of crossover locations are generated during degnome mating, but this random number is supposed to average out to a number specified by the user. Effectively, if more than one location is picked at the same spot they will cancel each other out. Any even number of crossovers at the same location will result in the chromosome flipping back and forth between the "father" and "mother" and not crossing over. Similarly any odd number of crossovers will result in a single crossover instead of multiple. As such the total number of crossovers will be slightly lower than it is supposed to be. This isn't a huge issue and it should only make a meaningful difference for short chromosomes with lots of crossover, but it isn't ideal

alanrogers commented 3 years ago

That is how it's supposed to work. Recombination happens when there is an odd number of crossovers between two loci. JBS Haldane figured out how to calculate the probability of this event in a famous paper in about 1920. When the probability of recombination is small, this distinction isn't important. But it is important at the scale of whole chromosomes, and this led to confusion in the early history of genetics.

On Thu, Nov 19, 2020 at 7:52 PM Daniel Tabin notifications@github.com wrote:

A random number of crossover locations are generated during degnome mating, but this random number is supposed to average out to a number specified by the user. Effectively, if more than one location is picked at the same spot they will cancel each other out. Any even number of crossovers at the same location will result in the chromosome flipping back and forth between the "father" and "mother" and not crossing over. Similarly any odd number of crossovers will result in a single crossover instead of multiple. As such the total number of crossovers will be slightly lower than it is supposed to be. This isn't a huge issue and it should only make a meaningful difference for short chromosomes with lots of crossover, but it isn't ideal

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Darokrithia/PopGenSim/issues/57, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRR6SQJ76DX3CPJJSWFPN3SQXKWVANCNFSM4T4GW24A .

Darokrithia commented 3 years ago

Interesting. Thanks for telling me about this I will look into it.

Without reviewing / researching the topic, I still think there is at least some issue. The biggest one is that if users expect an average of 30 crossovers per chromosome but only get 29, then there is some sort of miscommunication between the user and the program. At a minimum I still need to update the usage / help text so users know what's going on. I also am not really doing anything special for multi-crossover events at a single loci. If I generate crossover locations with loci indexes {2, 4, 5, 5, 5, 7}, the output is identical to crossover loci indexes {2, 4, 5, 7}. Similarly, if the random crossover loci set I generate is {1, 3, 3, 8} it gives the same child as the set {1, 8}. If some extra special recombination happens when multiple crossover events happen at a single loci in real biological creatures, I should probably modify the code to reflect that.

Thank you again for your feedback! I didn't realize you still watched this repo to be honest haha. I will definitely look into all of this in greater detail

alanrogers commented 3 years ago

I learned about Haldane's formula years ago when a reviewer pointed out that a formula I had derived was not new: it went back to Haldane. So I took the proof out of my own paper and cited Haldane. But I kept the LaTeX version of my own proof, which I will attach.

Alan

On Thu, Nov 19, 2020 at 11:36 PM Daniel Tabin notifications@github.com wrote:

Interesting. Thanks for telling me about this I will look into it.

Without reviewing / researching the topic, I still think there is at least some issue. The biggest one is that if users expect an average of 30 crossovers per chromosome but only get 29, then there is some sort of miscommunication between the user and the program. At a minimum I still need to update the usage / help text so users know what's going on. I also am not really doing anything special for multi-crossover events at a single loci. If I generate crossover locations with loci indexes {2, 4, 5, 5, 5, 7}, the output is identical to crossover loci indexes {2, 4, 5, 7}. Similarly, if the random crossover loci set I generate is {1, 3, 3, 8} it gives the same child as the set {1, 8}. If some extra special recombination happens when multiple crossover events happen at a single loci in real biological creatures, I should probably modify the code to reflect that.

Thank you again for your feedback! I didn't realize you still watched this repo to be honest haha. I will definitely look into all of this in greater detail

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Darokrithia/PopGenSim/issues/57#issuecomment-730896612, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRR6SW3MJ33U5JSRFRYPOLSQYE7FANCNFSM4T4GW24A .