NelleV / moanin

Timecourse transcriptomic analysis
https://nellev.github.io/moanin/
Other
5 stars 0 forks source link

Error in `kmeans_splines`, getting NAN in the centroids #61

Closed epurdom closed 4 years ago

epurdom commented 4 years ago
library(timecoursedata)
data(shoemaker2015)
testData<-shoemaker2015$data[1:500,]
whSamples<-which(shoemaker2015$meta$Group %in% c("C","K","M"))
testData<-testData[,whSamples]
testMeta<-droplevels(shoemaker2015$meta[whSamples,])
moanin_model = create_moanin_model(meta=testMeta)
clustering_results = moanin::splines_kmeans(
     data=testData,moanin_model, n_init=1,
      random_seed=random_seed)
any(is.nan(clustering_results$centroids))

Note that if I use my old testData object that had only "C","K" I didn't hit these errors, i.e.

whSamples<-which(shoemaker2015$meta$Group %in% c("C","K"))

And I also don't hit these errors if rescale=FALSE,

i.e.

> clustering_results = moanin::splines_kmeans(
+      data=testData,moanin_model, n_init=1,
+       random_seed=random_seed,rescale=FALSE)
> any(is.nan(clustering_results$centroids))
[1] FALSE

This code is based on a test in the test-cluster.R for splines_kmeans_score_and_label. I was trying to convert it to use the small example data I created. Before I was just creating the example data with "C","K" used, but now I run it with also "M" so as to be able to test more complicated contrasts. splines_kmeans_score_and_label can't handle the NaN in the centroids.

Stepping through the function, I see that the NaN values are given by the output of ClusterR, in otherwords, it's not the result of errors in post-processing the results.

Final note: I am converting to support bioconductor class on the EAP/moaninClass branch, where I first discovered it, but the code above is from before that conversion.

NelleV commented 4 years ago

I'm reproducing this error. I think the appropriate way forward is to raise an exception. This can happen with K-means when the initialization is not good enough or that the number of clusters is not appropriate.

NelleV commented 4 years ago

Actually, I'll raise a warning and edit the splines_kmeans_score_and_label to deal with NA clusters.