Closed brandonlind closed 4 years ago
Indeed, the documentation for pollseq data is not very good, and maybe not up to date (no more sampling @mblumuga?).
For imputation, you can always impute before giving the matrix to pcadapt()
I believe.
Imputing is fine I would think, but the function could impute without knowledge of user (for those who don't read the code). I was adding 9
s to missing data after reading in with pcadapt.read
, given that the docs mention this is automatic when read in from bed, etc, but did not mention poolseq explicitly.
pcmat <- pcadapt.read(mat)
pcmat[is.na(pcmat)] <- 9
...
But once I forgot to put in the 9
s and had to go figure out why my data looked weird. A flag, or a printed warning, in the pcadapt.pcadapt_pool
function could help avoid unexpected behavior for unbeknownst users.
The value 9 is used only for formats "pcadapt" and "lfmm".
For poolseq data, pcmat
should be a standard R matrix with standard missing values.
In the function
pcadapt.pcadapt_pool
, the code will impute any missing data with the mean frequency. This should likely be a flag option instead of default. Despite documentation describing that a9
should be put in place of any missing data, imputation is not mentioned in the manual.pdf for v4.1.0, nor in the article https://bcm-uga.github.io/pcadapt/articles/pcadapt.html