gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
42 stars 18 forks source link

Better GE rate simulation? #108

Closed jamonterotena closed 1 year ago

jamonterotena commented 1 year ago

Hello again,

I noticed that when simulating genotyping errors the actual genotyping error rate can differ from the input. This is because the input GE rate is taken as a probability that any site suffers a GE:

markers = which(rbinom(length(masked_genotypes), 1, error_rate) == 1)

I guess this is a matter of the definition of genotyping error rate and not a bug, but I think it's more intuitive to think that GE rate 6% means that there are 6 genotyping errors in 100 sites, and not maybe 4, 7 or 6.

I suggest something like the following

total_GEs = error_rate*length(masked_genotypes)
markers = sample(1:length(masked_genotypes), total_GEs , replace = FALSE) ## Randomly pick a proportion of the sites equal to error_rate for GEs
gregorgorjanc commented 1 year ago

@jamonterotena what/which piece of code in AlphaSimR are you referring to?

jamonterotena commented 1 year ago

@gregorgorjanc recombinations. Not sure of the name of the script, sorry.

gregorgorjanc commented 1 year ago

@jamonterotena are you referring to a function, if so, which one? If you are referring to a script, which script are you referring to?

If this is more of a general discussion, then https://github.com/gaynorr/AlphaSimR/discussions might be a better place. If I am getting your question right, then it really depends what you want to do. Saying that genotyping error rate is 6% and getting, say, 5, errors out of a 100 genotypes, is just how probability works. If you want exactly 6 errors, then you can use your solution, but note that in reality we always have some deviations/distribution of values;)