Closed abodein closed 6 years ago
Hello Antoine,
Thank you for using fdapace
. Can you please provide a minimal example reproducing the behaviour you describe?
All the best, Pantelis
Hello Pantelis, Thanks for your quick answer! Here is my code and my data in a tar ball:
load("./mydata.RData")
library(fdapace)
FPCA_input <- MakeFPCAInputs(IDs = colnames(mydata) %>% rep(each=dim(mydata)[1]),
tVec = rep(rownames(mydata) %>% as.numeric(),dim(mydata)[2]),
yVec = mydata)
fclust.res <- FClust(FPCA_input$Ly, FPCA_input$Lt, optnsFPCA = list(userBwCov= 2, FVEthreshold = 0.90), k = K, cmethod = method)
method = "kCFC"
for(K in 3:20){
print(K)
FClust(FPCA_input$Ly, FPCA_input$Lt, optnsFPCA = list(userBwCov= 2, FVEthreshold = 0.90), k = K, cmethod = method)
}
After K = 9, i have " Error in CheckOptions(Lt, optns, numOfCurves) : FPCA is aborted because the argument: maxK is invalid! " mydata.RData.tar.gz
Hello Antoine,
The following takes place: The kCFC
is initialised by standard stats::kmeans
on the FPC scores (Chiou & Li 2007, Sect. 2.2.1). As the number of cluster is increasing, certain clusters provided by kmeans
get progressively smaller. For K = 9
kmeans
returns a cluster of only two curves (alongside 8 reasonably sized clusters). The subsequent FPCA within that 2-curve cluster fails as the maxK
(the maximum number of principal components to consider) becomes invalid.
Immediate work-around:
Use method="EMCluster"
- it works rather well.
Hopeful work-around:
One can use kCFC
directly and set the kSeed
argument. This will probably perturbing the initial solution of the algorithm and maybe let us escape the fact that the problem gets ill-posed as the number of clusters increases. As K increases the probability that this work-around works decreases.
Changes on our side:
We will make CheckOptions
to give a warning if the number of curves considered is too small. While we expect the user to pick this on their own in the cases that FPCA
is used directly, I appreciate that in automated calls of FPCA
, such a warning can be informative. Similarly we will give a warning from within kCFC
if the number of curves within a cluster is suspiciously small.
Thank you for reporting this. It helps us account for use-cases that were not immediately visible to us.
All the best, Pantelis
Hi, I have some troubles when i am using FClust() function. For the same data and a variable K, I have sometimes that error message : "FPCA is aborted because the argument: maxK is invalid!" When i try to debug the CheckOptions function, the message appears just after optns[['maxK']] changes from 10 to 0. Can you help me please ?