gaynorr / AlphaSimR

R package for breeding program simulations
https://gaynorr.github.io/AlphaSimR/
Other
42 stars 18 forks source link

Sample SNP array markers with ascertainment bias by default #117

Open gregorgorjanc opened 1 year ago

gregorgorjanc commented 1 year ago

Is your feature request related to a problem? Please describe. Currently we selected markers for SNP array/chip at random, meaning that we can get similar distribution of allele frequencies as for QTL. In reality, these two sets of loci will likely have different distributions.

Describe the solution you'd like I think that it would be prudent to change the default marker selection for SNP arrays to follow "uniform" distribution across the allele frequency spectrum.

gaynorr commented 1 year ago

The restrSegSites function in SimParam has a minSnpFreq option that is intended to serve the purpose of modeling ascertainment bias. It lets the user set a cut-off frequency for SNPs. It will then allocate SNPs subject to this restriction before allocating QTL. The main drawback is that the thresholds may not be viable given the numbers of SNPs and QTL requested, resulting in an error.

There's also the option to define custom SNP chips using the new addSnpChipByName function in SimParam. This function lets you define SNPs on the chip by providing a vector of names for the loci. A useful trick with this is that you can define a SNP chip at any time in the simulation. I block adding traits after the simulation starts, because it can lead to errors. However, adding SNP chips doesn't pose an issue so it's not blocked. This might be useful if you want to account for ascertainment bias after allowing for a burn-in.

I guess the bottom line is I don't plan to add new functionality at the moment, but it would make sense to add to the documentation to show some of these options.

gregorgorjanc commented 10 months ago

Here is an example of a function that sub-samples rows of a data.frame to get more uniform coverage of values https://gist.github.com/gregorgorjanc/a1f7bd19bfc021a090f0054ab77e8f37

It has documentation and examples - here is a beta example (~WGS like derived allele freq spectrum, very roughly) that is sub-sampled to get more uniform distribution (~SNP array like derived allele freq spectrum, very roughly)

Screenshot 2023-10-28 at 15 23 14