Closed AudreyAAMartin closed 11 months ago
@gaynorr this is fundamentally an issue about numeric and character id's and logical subsetting of a pop object.
This has likely crept in with the added functionality for external (character) ids.
I think it would be prudent to establish some principles and then we can help fixing cases. What about this:
Option 1) When subsetting it would be best to subset with a logical pop[logical]
or using something like pop[pop@id %in% whatever]
or pop[pop@iid %in% whatever]
, both of which of course give a logical
Option 2) Create subsetting methods that know how to handle logical, numeric, and character vectors in subsetting - we have implemented this in SIMplyBee and makes life much easier for developers and users. Of course, we need to establish what these methods would do!
logical is easy pop[logical]
I feel that character is also easy pop[pop@id %in% character]
numeric? maybe pop[pop@id %in% numeric]
or should it do pop[numeric]
, which of course fails when numeric
is beyond nInd
but this is also true for any R object
This functionality was originally designed for a fairly limited use. It was meant as way to chain together more than one selection function without creating intermediate populations. It is a way to improve computational efficiency in really large simulations. However, I the computational cost here isn't really too big of deal in these simulations, because they are very minimal relative to the computational needs of fitting GS models.
It was designed for something like this:
take = selectWithinFam(pop, nInd=2, returnPop=FALSE)
pop = selectInd(pop, nInd=10, candidates=take)
I think it would make sense to expand the "candidates" argument to work the same way as population subsetting. The population subsetting already handles several different data types. The key here is that I don't actually want to subset the population. The phenotypes are subsetted instead.
Describe the bug A clear and concise description of what the bug is. When using selectInd(), the individuals that are eligible for selection within the pop are defined as 1:nInd. The candidate option takes in a list of ID as input to pre-subset the population on which selection can be applied. Candidate (ID) and eligible lists have matching formatting when working with only simulation. However, when combined with real data, the candidate list extracted from the ID (id or iid) within the pop does not correspond to the eligible object format. The function runs with a warning (Suitable candidates smaller than nInd, returning 0 individuals) and the output is empty. The problem is coming from how the eligible individuals are defined in AlphaSimR.
Steps To Reproduce
Expected behavior The function should create and populate a new pop object with the 3 best individuals and the candidate input should be based on ID. A way to avoid the problem is to create a new ID for the whole (sorted) population as 1:nInd (as code below) and use it as input for candidates in SelectInd().
Additional context The error will also occur with numerical ID that are different from 1:nInd. It may be even more problematic than with alphanumerical ID as individuals could be considered as candidates when not intended due to the overlap between 1:nInd and the numerical ID.