functionaldata / tPACE

Testing version of fdapace
Other
31 stars 22 forks source link

maxK is invalid! #22

Closed abodein closed 6 years ago

abodein commented 6 years ago

Hi, I have some troubles when i am using FClust() function. For the same data and a variable K, I have sometimes that error message : "FPCA is aborted because the argument: maxK is invalid!" When i try to debug the CheckOptions function, the message appears just after optns[['maxK']] changes from 10 to 0. Can you help me please ?

hadjipantelis commented 6 years ago

Hello Antoine,

Thank you for using fdapace. Can you please provide a minimal example reproducing the behaviour you describe?

All the best, Pantelis

abodein commented 6 years ago

Hello Pantelis, Thanks for your quick answer! Here is my code and my data in a tar ball:

load("./mydata.RData")
library(fdapace)

FPCA_input <- MakeFPCAInputs(IDs = colnames(mydata) %>% rep(each=dim(mydata)[1]),
                             tVec = rep(rownames(mydata) %>% as.numeric(),dim(mydata)[2]),
                             yVec = mydata)

fclust.res <- FClust(FPCA_input$Ly, FPCA_input$Lt, optnsFPCA = list(userBwCov= 2, FVEthreshold = 0.90), k = K, cmethod = method)
method = "kCFC"
for(K in 3:20){
  print(K)
  FClust(FPCA_input$Ly, FPCA_input$Lt, optnsFPCA = list(userBwCov= 2, FVEthreshold = 0.90), k = K, cmethod = method)
}

After K = 9, i have " Error in CheckOptions(Lt, optns, numOfCurves) : FPCA is aborted because the argument: maxK is invalid! " mydata.RData.tar.gz

hadjipantelis commented 6 years ago

Hello Antoine,

The following takes place: The kCFC is initialised by standard stats::kmeans on the FPC scores (Chiou & Li 2007, Sect. 2.2.1). As the number of cluster is increasing, certain clusters provided by kmeans get progressively smaller. For K = 9 kmeans returns a cluster of only two curves (alongside 8 reasonably sized clusters). The subsequent FPCA within that 2-curve cluster fails as the maxK (the maximum number of principal components to consider) becomes invalid.

Immediate work-around: Use method="EMCluster" - it works rather well.

Hopeful work-around: One can use kCFC directly and set the kSeed argument. This will probably perturbing the initial solution of the algorithm and maybe let us escape the fact that the problem gets ill-posed as the number of clusters increases. As K increases the probability that this work-around works decreases.

Changes on our side: We will make CheckOptions to give a warning if the number of curves considered is too small. While we expect the user to pick this on their own in the cases that FPCA is used directly, I appreciate that in automated calls of FPCA, such a warning can be informative. Similarly we will give a warning from within kCFC if the number of curves within a cluster is suspiciously small.

Thank you for reporting this. It helps us account for use-cases that were not immediately visible to us.

All the best, Pantelis